HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Prismer: A Vision-Language Model with Multi-Task Experts

Shikun Liu Linxi Fan Edward Johns Zhiding Yu Chaowei Xiao Anima Anandkumar

Prismer: A Vision-Language Model with Multi-Task Experts

Abstract

Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of task-specific experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from multiple readily-available, pre-trained experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-arts, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.

Code Repositories

KastanDay/video-pretrained-transformer
pytorch
Mentioned in GitHub
nvlabs/prismer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-captioning-on-coco-captionsPrismer
BLEU-4: 40.4
CIDER: 136.5
METEOR: 31.4
SPICE: 24.4
image-captioning-on-nocaps-entirePrismer
B1: 84.87
B2: 69.99
B3: 52.48
B4: 33.66
CIDEr: 110.84
METEOR: 31.13
ROUGE-L: 60.55
SPICE: 14.91
image-captioning-on-nocaps-valPrismer
CIDEr: 107.9
SPICE: 14.8
visual-question-answering-on-vqa-v2-test-devPrismer
Accuracy: 78.43
visual-question-answering-on-vqa-v2-test-stdPrismer
number: 61.39
other: 69.70
overall: 78.49
yes/no: 93.09

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp