HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Self-Critical Reasoning for Robust Visual Question Answering

Jialin Wu; Raymond J. Mooney

Self-Critical Reasoning for Robust Visual Question Answering

Abstract

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5% using textual explanations and 48.5% using automatically annotated regions.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-vqa-cpUpDn+SCR (VQA-X)
Score: 49.45

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Self-Critical Reasoning for Robust Visual Question Answering | Papers | HyperAI