HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

Soravit Changpinyo Piyush Sharma Nan Ding Radu Soricut

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

Abstract

The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training. However, these datasets are often collected with overrestrictive requirements inherited from their original target tasks (e.g., image caption generation), which limit the resulting dataset scale and diversity. We take a step further in pushing the limits of vision-and-language pre-training data by relaxing the data collection pipeline used in Conceptual Captions 3M (CC3M) [Sharma et al. 2018] and introduce the Conceptual 12M (CC12M), a dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training. We perform an analysis of this dataset and benchmark its effectiveness against CC3M on multiple downstream tasks with an emphasis on long-tail visual recognition. Our results clearly illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks.

Code Repositories

gicheonkang/gst-visdial
pytorch
Mentioned in GitHub
google-research-datasets/conceptual-12m
Official
Mentioned in GitHub
facebookresearch/meru
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-captioning-on-nocaps-val-in-domainEnc-Dec
CIDEr: 92.6
Pre-train (#images): 15M
SPICE: 12.5
image-captioning-on-nocaps-val-near-domainEnc-Dec
CIDEr: 88.3
SPICE: 12.1
image-captioning-on-nocaps-val-out-domainEnc-Dec
CIDEr: 94.5
SPICE: 11.9
image-captioning-on-nocaps-val-overallEnc-Dec
CIDEr: 90.2
SPICE: 12.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp