4 months ago

Simple Baseline for Visual Question Answering

Bolei Zhou; Yuandong Tian; Sainbayar Sukhbaatar; Arthur Szlam; Rob Fergus

Abstract

We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

Code Repositories

yikang-li/iqan

pytorch

Mentioned in GitHub

karunraju/VQA

pytorch

Mentioned in GitHub

sidaw/nbsvm

Mentioned in GitHub

miohana/vqa

Mentioned in GitHub

sidgan/whats_in_a_question

Mentioned in GitHub

metalbubble/VQAbaseline

Official

Mentioned in GitHub

SkyOL5/VQA-CoAttention

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
visual-question-answering-on-coco-visual-1	iBOWIMG baseline	Percentage correct: 62.0
visual-question-answering-on-coco-visual-4	iBOWIMG baseline	Percentage correct: 55.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Simple Baseline for Visual Question Answering

Bolei Zhou; Yuandong Tian; Sainbayar Sukhbaatar; Arthur Szlam; Rob Fergus

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters