HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention

Liu Xiangcheng ; Wu Tianyi ; Guo Guodong

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully
  Exploiting Self-Attention

Abstract

Vision transformer has emerged as a new paradigm in computer vision, showingexcellent performance while accompanied by expensive computational cost. Imagetoken pruning is one of the main approaches for ViT compression, due to thefacts that the complexity is quadratic with respect to the token number, andmany tokens containing only background regions do not truly contribute to thefinal prediction. Existing works either rely on additional modules to score theimportance of individual tokens, or implement a fixed ratio pruning strategyfor different input instances. In this work, we propose an adaptive sparsetoken pruning framework with a minimal cost. Specifically, we firstly proposean inexpensive attention head importance weighted class attention scoringmechanism. Then, learnable parameters are inserted as thresholds to distinguishinformative tokens from unimportant ones. By comparing token attention scoresand thresholds, we can discard useless tokens hierarchically and thusaccelerate inference. The learnable thresholds are optimized in budget-awaretraining to balance accuracy and complexity, performing the correspondingpruning configurations for different input instances. Extensive experimentsdemonstrate the effectiveness of our approach. Our method improves thethroughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, whichachieves a better trade-off between accuracy and latency than the previousmethods.

Code Repositories

cydia2018/as-vit
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
efficient-vits-on-imagenet-1k-with-deit-sAS-DeiT-S (50%)
GFLOPs: 2.3
Top 1 Accuracy: 78.7
efficient-vits-on-imagenet-1k-with-deit-sAS-DeiT-S (65%)
GFLOPs: 3.0
Top 1 Accuracy: 79.6
efficient-vits-on-imagenet-1k-with-lv-vit-sAS-LV-S (60%)
GFLOPs: 3.9
Top 1 Accuracy: 82.6
efficient-vits-on-imagenet-1k-with-lv-vit-sAS-LV-S (70%)
GFLOPs: 4.6
Top 1 Accuracy: 83.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp