HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Recursive Visual Attention in Visual Dialog

Yulei Niu; Hanwang Zhang; Manli Zhang; Jianhong Zhang; Zhiwu Lu; Ji-Rong Wen

Recursive Visual Attention in Visual Dialog

Abstract

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems: (1) How to answer visually-grounded questions, which is the core challenge in visual question answering (VQA); (2) How to infer the co-reference between questions and the dialog history. An example of visual co-reference is: pronouns (\eg, they'') in the question (\eg,Are they on or off?'') are linked with nouns (\eg, lamps'') appearing in the dialog history (\eg,How many lamps are there?'') and the object grounded in the image. In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA). Specifically, our dialog agent browses the dialog history until the agent has sufficient confidence in the visual co-reference resolution, and refines the visual attention recursively. The quantitative and qualitative experimental results on the large-scale VisDial v0.9 and v1.0 datasets demonstrate that the proposed RvA not only outperforms the state-of-the-art methods, but also achieves reasonable recursion and interpretable attention maps without additional annotations. The code is available at \url{https://github.com/yuleiniu/rva}.

Code Repositories

yuleiniu/rva
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-dialog-on-visdial-v09-valRVA
MRR: 0.6634
Mean Rank: 3.93
R@1: 52.71
R@10: 90.73
R@5: 82.97
visual-dialog-on-visual-dialog-v1-0-test-stdRVA
MRR (x 100): 63.03
Mean: 4.18
NDCG (x 100): 55.59
R@1: 49.03
R@10: 89.83
R@5: 80.40

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp