Command Palette
Search for a command to run...

Abstract
We present the training recipe and results of scaling up PaLI-X, amultilingual vision and language model, both in terms of size of the componentsand the breadth of its training task mixture. Our model achieves new levels ofperformance on a wide-range of varied and complex tasks, including multipleimage-based captioning and question-answering tasks, image-based documentunderstanding and few-shot (in-context) learning, as well as object detection,video question answering, and video captioning. PaLI-X advances thestate-of-the-art on most vision-and-language benchmarks considered (25+ ofthem). Finally, we observe emerging capabilities, such as complex counting andmultilingual object detection, tasks that are not explicitly in the trainingmix.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| chart-question-answering-on-chartqa | PaLI-X (Single-task FT) | 1:1 Accuracy: 70.9 |
| chart-question-answering-on-chartqa | PaLI-X (Multi-task FT) | 1:1 Accuracy: 70.6 |
| chart-question-answering-on-chartqa | PaLI-X (Single-task FT w/ OCR) | 1:1 Accuracy: 72.3 |
| fine-grained-image-recognition-on-oven | PaLI-X | Accuracy: 23.1 |
| temporal-casual-qa-on-next-qa | PaLI-X | WUPS: 38.3 |
| visual-question-answering-on-docvqa-test | PaLI-X (Single-task FT w/ OCR) | ANLS: 0.868 |
| visual-question-answering-on-docvqa-test | PaLI-X (Single-task FT) | ANLS: 0.80 |
| visual-question-answering-on-docvqa-test | PaLI-X (Multi-task FT) | ANLS: 0.809 |
| visual-question-answering-on-ok-vqa | PaLI-X (Single-task FT) | Accuracy: 66.1 |
| visual-question-answering-vqa-on | PaLI-X (Single-task FT) | ANLS: 49.2 |
| visual-question-answering-vqa-on | PaLI-X (Multi-task FT) | ANLS: 50.7 |
| visual-question-answering-vqa-on | PaLI-X (Single-task FT w/ OCR) | ANLS: 54.8 |
| visual-question-answering-vqa-on-infoseek | PaLI-X | Accuracy: 24 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.