Command Palette
Search for a command to run...
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hu Hexiang ; Luan Yi ; Chen Yang ; Khandelwal Urvashi ; Joshi Mandar ; Lee Kenton ; Toutanova Kristina ; Chang Ming-Wei

Abstract
Large-scale multi-modal pre-training models such as CLIP and PaLI exhibitstrong generalization on various visual domains and tasks. However, existingimage classification benchmarks often evaluate recognition on a specific domain(e.g., outdoor images) or a specific task (e.g., classifying plant species),which falls short of evaluating whether pre-trained foundational models areuniversal visual recognizers. To address this, we formally present the task ofOpen-domain Visual Entity recognitioN (OVEN), where a model need to link animage onto a Wikipedia entity with respect to a text query. We constructOVEN-Wiki by re-purposing 14 existing datasets with all labels grounded ontoone single label space: Wikipedia entities. OVEN challenges models to selectamong six million possible Wikipedia entities, making it a general visualrecognition benchmark with the largest number of labels. Our study onstate-of-the-art pre-trained models reveals large headroom in generalizing tothe massive-scale label space. We show that a PaLI-based auto-regressive visualrecognition model performs surprisingly well, even on Wikipedia entities thathave never been seen during fine-tuning. We also find existing pretrainedmodels yield different strengths: while PaLI-based models obtain higher overallperformance, CLIP-based models are better at recognizing tail entities.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| fine-grained-image-recognition-on-oven | PaLI (17B) | Accuracy: 20.2 |
| fine-grained-image-recognition-on-oven | CLIP2CLIP | Accuracy: 5.3 |
| fine-grained-image-recognition-on-oven | PaLI (3B) | Accuracy: 11.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.