HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Gi-Cheon Kang; Jaeseo Lim; Byoung-Tak Zhang

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Abstract

Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and exploit visually-grounded information. A problem called visual reference resolution involves these challenges, requiring the agent to resolve ambiguous references in a given question and find the references in a given image. In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution. DAN consists of two kinds of attention networks, REFER and FIND. Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism. FIND module takes image features and reference-aware representations (i.e., the output of REFER module) as input, and performs visual grounding via bottom-up attention mechanism. We qualitatively and quantitatively evaluate our model on VisDial v1.0 and v0.9 datasets, showing that DAN outperforms the previous state-of-the-art model by a significant margin.

Code Repositories

gicheonkang/DAN-VisDial
Official
pytorch
Mentioned in GitHub
phellonchen/DMRM
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-dialog-on-visdial-v09-valDAN
MRR: 66.38
Mean Rank: 4.04
R@1: 53.33
R@10: 90.38
R@5: 82.42
visual-dialog-on-visual-dialog-v1-0-test-stdDAN
MRR (x 100): 63.2
Mean: 4.3
NDCG (x 100): 57.59
R@1: 49.63
R@10: 89.35
R@5: 79.75

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp