3 months ago

Prismer: A Vision-Language Model with Multi-Task Experts

Shikun Liu Linxi Fan Edward Johns Zhiding Yu Chaowei Xiao Anima Anandkumar

Abstract

Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of task-specific experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from multiple readily-available, pre-trained experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-arts, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.

Code Repositories

KastanDay/video-pretrained-transformer

pytorch

Mentioned in GitHub

nvlabs/prismer

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-captioning-on-coco-captions	Prismer	BLEU-4: 40.4 CIDER: 136.5 METEOR: 31.4 SPICE: 24.4
image-captioning-on-nocaps-entire	Prismer	B1: 84.87 B2: 69.99 B3: 52.48 B4: 33.66 CIDEr: 110.84 METEOR: 31.13 ROUGE-L: 60.55 SPICE: 14.91
image-captioning-on-nocaps-val	Prismer	CIDEr: 107.9 SPICE: 14.8
visual-question-answering-on-vqa-v2-test-dev	Prismer	Accuracy: 78.43
visual-question-answering-on-vqa-v2-test-std	Prismer	number: 61.39 other: 69.70 overall: 78.49 yes/no: 93.09

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette