HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Hierarchical Question-Image Co-Attention for Visual Question Answering

Jiasen Lu; Jianwei Yang; Dhruv Batra; Devi Parikh

Hierarchical Question-Image Co-Attention for Visual Question Answering

Abstract

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN). Our model improves the state-of-the-art on the VQA dataset from 60.3% to 60.5%, and from 61.6% to 63.3% on the COCO-QA dataset. By using ResNet, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.

Code Repositories

karunraju/VQA
pytorch
Mentioned in GitHub
WillSuen/VQA
tf
Mentioned in GitHub
phisad/keras-hicoatt
tf
Mentioned in GitHub
jiasenlu/HieCoAttenVQA
Official
pytorch
Mentioned in GitHub
miohana/vqa
tf
Mentioned in GitHub
SkyOL5/VQA-CoAttention
pytorch
Mentioned in GitHub
arya46/VQA_HieCoAtt
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-dialog-on-visdial-v09-valHieCoAtt-QI
MRR: 57.88
Mean Rank: 5.84
R@1: 43.51
R@10: 83.96
R@5: 74.49
visual-question-answering-on-coco-visual-1HQI+ResNet
Percentage correct: 66.1
visual-question-answering-on-coco-visual-4HQI+ResNet
Percentage correct: 62.1
visual-question-answering-on-vqa-v1-test-devHieCoAtt (ResNet)
Accuracy: 61.8
visual-question-answering-on-vqa-v1-test-stdHieCoAtt (ResNet)
Accuracy: 62.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp