Command Palette
Search for a command to run...
Xiaohua Zhai; Xiao Wang; Basil Mustafa; Andreas Steiner; Daniel Keysers; Alexander Kolesnikov; Lucas Beyer

Abstract
This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Tuning" (LiT), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 85.2% zero-shot transfer accuracy on the ImageNet test set, and 82.5% on the challenging out-of-distribution ObjectNet test set.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-objectnet | LiT | Top-1 Accuracy: 82.5 |
| zero-shot-transfer-image-classification-on-1 | LiT-tuning | Accuracy (Private): 84.5 Accuracy (Public): 75.7 |
| zero-shot-transfer-image-classification-on-3 | LiT-tuning | Accuracy (Private): 78.7 Accuracy (Public): 66.6 |
| zero-shot-transfer-image-classification-on-4 | LiT-tuning | Accuracy: 93.9 |
| zero-shot-transfer-image-classification-on-5 | LiT-tuning | Accuracy (Private): 79.4 Accuracy (Public): 37.8 |
| zero-shot-transfer-image-classification-on-6 | LiT-tuning | Accuracy (Private): 81.1 Accuracy (Public): 54.5 |
| zero-shot-transfer-image-classification-on-7 | LiT-tuning | Accuracy (Private): 88.0 Accuracy (Public): 82.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.