Command Palette
Search for a command to run...
Quan Sun; Jinsheng Wang; Qiying Yu; Yufeng Cui; Fan Zhang; Xiaosong Zhang; Xinlong Wang

Abstract
Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models. We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters. With only 6-billion training samples seen, EVA-CLIP-18B achieves an exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classification benchmarks, outperforming its forerunner EVA-CLIP (5-billion parameters) and other open-source CLIP models by a large margin. Remarkably, we observe a consistent performance improvement with the model size scaling of EVA-CLIP, despite maintaining a constant training dataset of 2-billion image-text pairs from LAION-2B and COYO-700M. This dataset is openly available and much smaller than the in-house datasets (e.g., DFN-5B, WebLI-10B) employed in other state-of-the-art CLIP models. EVA-CLIP-18B demonstrates the potential of EVA-style weak-to-strong visual model scaling. With our model weights made publicly available, we hope to facilitate future research in vision and multimodal foundation models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| zero-shot-transfer-image-classification-on-1 | EVA-CLIP-18B | Accuracy (Private): 83.8 |
| zero-shot-transfer-image-classification-on-17 | EVA-CLIP-18B | Top 1 Accuracy: 95.8 |
| zero-shot-transfer-image-classification-on-2 | EVA-CLIP-18B | Accuracy: 77.7 |
| zero-shot-transfer-image-classification-on-3 | EVA-CLIP-18B | Accuracy (Private): 77.9 |
| zero-shot-transfer-image-classification-on-4 | EVA-CLIP-18B | Accuracy: 95.7 |
| zero-shot-transfer-image-classification-on-5 | EVA-CLIP-18B | Accuracy (Private): 87.3 |
| zero-shot-transfer-image-classification-on-6 | EVA-CLIP-18B | Accuracy (Private): 82.2 |
| zero-shot-transfer-image-classification-on-8 | EVA-CLIP-18B | Accuracy (Private): 74.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.