HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Bilinear Attention Networks

Jin-Hwa Kim; Jaehyun Jun; Byoung-Tak Zhang

Bilinear Attention Networks

Abstract

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.

Code Repositories

jnhwkim/ban-vqa
Official
pytorch
Mentioned in GitHub
ronghanghu/pythia
pytorch
Mentioned in GitHub
facebookresearch/pythia
pytorch
Mentioned in GitHub
Cyanogenoid/vqa-counting
pytorch
Mentioned in GitHub
allenai/pythia
pytorch
Mentioned in GitHub
jackroos/pythia
pytorch
Mentioned in GitHub
facebookresearch/mmf
pytorch
Mentioned in GitHub
ZephyrZhuQi/ssbaseline
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
phrase-grounding-on-flickr30k-entities-testBAN (Bottom-Up detector)
R@1: 69.69
R@10: 86.35
R@5: 84.22
visual-question-answering-on-vqa-v2-test-devBAN+Glove+Counter
Accuracy: 70.04
visual-question-answering-on-vqa-v2-test-stdBAN+Glove+Counter
overall: 70.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp