HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Visual Prompt Tuning

Menglin Jia Luming Tang Bor-Chun Chen Claire Cardie Serge Belongie Bharath Hariharan Ser-Nam Lim

Visual Prompt Tuning

Abstract

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

Code Repositories

KMnP/vpt
Official
pytorch
Mentioned in GitHub
wgcban/apt
pytorch
Mentioned in GitHub
heekhero/DTL
pytorch
Mentioned in GitHub
TooTouch/VPT
pytorch
Mentioned in GitHub
Yiming-M/CLIP-EBC
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
long-tail-learning-on-cifar-100-lt-r-10VPT
Error Rate: 10.4
long-tail-learning-on-cifar-100-lt-r-100VPT
Error Rate: 19
long-tail-learning-on-cifar-100-lt-r-50VPT
Error Rate: 15.2
prompt-engineering-on-imagenet-21kVPT
Accuracy: 24.8
visual-prompt-tuning-on-fgvcVPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 72.02
visual-prompt-tuning-on-fgvcVPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 57.84
visual-prompt-tuning-on-fgvcVPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 83.12
visual-prompt-tuning-on-fgvcVPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 79.26
visual-prompt-tuning-on-vtab-1k-natural-7VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 67.34
visual-prompt-tuning-on-vtab-1k-natural-7VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 39.96
visual-prompt-tuning-on-vtab-1k-natural-7VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 70.27
visual-prompt-tuning-on-vtab-1k-natural-7VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 36.02
visual-prompt-tuning-on-vtab-1k-specialized-4VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 60.61
visual-prompt-tuning-on-vtab-1k-specialized-4VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 69.65
visual-prompt-tuning-on-vtab-1k-specialized-4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 83.04
visual-prompt-tuning-on-vtab-1k-specialized-4VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 82.26
visual-prompt-tuning-on-vtab-1k-structured-8VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 37.55
visual-prompt-tuning-on-vtab-1k-structured-8VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Mean Accuracy: 42.38
visual-prompt-tuning-on-vtab-1k-structured-8VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 26.57
visual-prompt-tuning-on-vtab-1k-structured-8VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Mean Accuracy: 27.50

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp