HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning Visual N-Grams from Web Data

Ang Li; Allan Jabri; Armand Joulin; Laurens van der Maaten

Learning Visual N-Grams from Web Data

Abstract

Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user comments. In particular, we develop visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image. Our visual n-gram models are feed-forward convolutional networks trained using new loss functions that are inspired by n-gram models commonly used in language modeling. We demonstrate the merits of our models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-transfer-image-classification-onVisual N-Grams
Accuracy: 72.4
zero-shot-transfer-image-classification-on-2Visual N-Grams
Accuracy: 23.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Learning Visual N-Grams from Web Data | Papers | HyperAI