HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

LiT: Zero-Shot Transfer with Locked-image text Tuning

Xiaohua Zhai; Xiao Wang; Basil Mustafa; Andreas Steiner; Daniel Keysers; Alexander Kolesnikov; Lucas Beyer

LiT: Zero-Shot Transfer with Locked-image text Tuning

Abstract

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Tuning" (LiT), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 85.2% zero-shot transfer accuracy on the ImageNet test set, and 82.5% on the challenging out-of-distribution ObjectNet test set.

Code Repositories

mlfoundations/open_clip
pytorch
Mentioned in GitHub
google-research/big_vision
Official
jax
Mentioned in GitHub
google-research/vision_transformer
Official
jax
Mentioned in GitHub
laion-ai/clip_benchmark
pytorch
Mentioned in GitHub
eify/clip_benchmark
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-objectnetLiT
Top-1 Accuracy: 82.5
zero-shot-transfer-image-classification-on-1LiT-tuning
Accuracy (Private): 84.5
Accuracy (Public): 75.7
zero-shot-transfer-image-classification-on-3LiT-tuning
Accuracy (Private): 78.7
Accuracy (Public): 66.6
zero-shot-transfer-image-classification-on-4LiT-tuning
Accuracy: 93.9
zero-shot-transfer-image-classification-on-5LiT-tuning
Accuracy (Private): 79.4
Accuracy (Public): 37.8
zero-shot-transfer-image-classification-on-6LiT-tuning
Accuracy (Private): 81.1
Accuracy (Public): 54.5
zero-shot-transfer-image-classification-on-7LiT-tuning
Accuracy (Private): 88.0
Accuracy (Public): 82.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp