HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PaLI-X: On Scaling up a Multilingual Vision and Language Model

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Abstract

We present the training recipe and results of scaling up PaLI-X, amultilingual vision and language model, both in terms of size of the componentsand the breadth of its training task mixture. Our model achieves new levels ofperformance on a wide-range of varied and complex tasks, including multipleimage-based captioning and question-answering tasks, image-based documentunderstanding and few-shot (in-context) learning, as well as object detection,video question answering, and video captioning. PaLI-X advances thestate-of-the-art on most vision-and-language benchmarks considered (25+ ofthem). Finally, we observe emerging capabilities, such as complex counting andmultilingual object detection, tasks that are not explicitly in the trainingmix.

Code Repositories

doc-doc/NExT-OE
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
chart-question-answering-on-chartqaPaLI-X (Single-task FT)
1:1 Accuracy: 70.9
chart-question-answering-on-chartqaPaLI-X (Multi-task FT)
1:1 Accuracy: 70.6
chart-question-answering-on-chartqaPaLI-X (Single-task FT w/ OCR)
1:1 Accuracy: 72.3
fine-grained-image-recognition-on-ovenPaLI-X
Accuracy: 23.1
temporal-casual-qa-on-next-qaPaLI-X
WUPS: 38.3
visual-question-answering-on-docvqa-testPaLI-X (Single-task FT w/ OCR)
ANLS: 0.868
visual-question-answering-on-docvqa-testPaLI-X (Single-task FT)
ANLS: 0.80
visual-question-answering-on-docvqa-testPaLI-X (Multi-task FT)
ANLS: 0.809
visual-question-answering-on-ok-vqaPaLI-X (Single-task FT)
Accuracy: 66.1
visual-question-answering-vqa-onPaLI-X (Single-task FT)
ANLS: 49.2
visual-question-answering-vqa-onPaLI-X (Multi-task FT)
ANLS: 50.7
visual-question-answering-vqa-onPaLI-X (Single-task FT w/ OCR)
ANLS: 54.8
visual-question-answering-vqa-on-infoseekPaLI-X
Accuracy: 24

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp