HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

VQA: Visual Question Answering

Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh

VQA: Visual Question Answering

Abstract

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

Code Repositories

chirag26495/DAN_VQA
pytorch
Mentioned in GitHub
mokhalid-dev/Attention-based-VQA-model
pytorch
Mentioned in GitHub
ramprs/grad-cam
pytorch
Mentioned in GitHub
mkhalil1998/EC601_Group_Project
pytorch
Mentioned in GitHub
vipulgupta1011/swapmix
pytorch
Mentioned in GitHub
yanxinyan1/yxy
pytorch
Mentioned in GitHub
moh833/VQA
Mentioned in GitHub
SatyamGaba/vqa
pytorch
Mentioned in GitHub
SatyamGaba/visual_question_answering
pytorch
Mentioned in GitHub
tbmoon/basic_vqa
pytorch
Mentioned in GitHub
ntusteeian/VQA_CNN-LSTM
pytorch
Mentioned in GitHub
mishajw/vocab_pie
Mentioned in GitHub
ruxuan666/VQA_program
pytorch
Mentioned in GitHub
SDaydreamer/VisualQA_Project
pytorch
Mentioned in GitHub
abhshkdz/neural-vqa-attention
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-coco-visualDLAIT
Percentage correct: 68.07
visual-question-answering-on-coco-visualHDU-USYD-UNCC
Percentage correct: 68.16
visual-question-answering-on-coco-visual-1LSTM Q+I
Percentage correct: 63.1
visual-question-answering-on-coco-visual-2LSTM + global features
Percentage correct: 65.02
visual-question-answering-on-coco-visual-2Dualnet ensemble
Percentage correct: 69.73
visual-question-answering-on-coco-visual-2LSTM blind
Percentage correct: 57.19
visual-question-answering-on-coco-visual-3Dualnet ensemble
Percentage correct: 71.18
visual-question-answering-on-coco-visual-3LSTM + global features
Percentage correct: 69.21
visual-question-answering-on-coco-visual-3LSTM blind
Percentage correct: 61.41
visual-question-answering-on-coco-visual-4LSTM Q+I
Percentage correct: 58.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp