HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Variational Causal Inference Network for Explanatory Visual Question Answering

{Changsheng Xu Shengsheng Qian Dizhan Xue}

Variational Causal Inference Network for Explanatory Visual Question Answering

Abstract

Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal reasoning task that requires answering visual questions and generating multimodal explanations for the reasoning processes. Unlike traditional Visual Question Answering (VQA) which focuses solely on answering, EVQA aims to provide user-friendly explanations to enhance the explainability and credibility of reasoning models. However, existing EVQA methods typically predict the answer and explanation separately, which ignores the causal correlation between them. Moreover, they neglect the complex relationships among question words, visual regions, and explanation tokens. To address these issues, we propose a Variational Causal Inference Network (VCIN) that establishes the causal correlation between predicted answers and explanations, and captures cross-modal relationships to generate rational explanations. First, we utilize a vision-and-language pretrained model to extract visual features and question features. Secondly, we propose a multimodal explanation gating transformer that constructs cross-modal relationships and generates rational explanations. Finally, we propose a variational causal inference to establish the target causal structure and predict the answers. Comprehensive experiments demonstrate the superiority of VCIN over state-of-the-art EVQA methods.

Benchmarks

BenchmarkMethodologyMetrics
explanatory-visual-question-answering-on-gqaVCIN
BLEU-4: 58.65
CIDEr: 519.23
GQA-test: 60.61
GQA-val: 81.80
Grounding: 77.33
METEOR: 41.57
ROUGE-L: 81.45
SPICE: 54.63
fs-mevqa-on-smeVCIN
#Learning Samples (N): 16
ACC: 17.77
BLEU-4: 9.17
CIDEr: 4.28
Detection: 0.28
METEOR: 19.82
ROUGE-L: 33.34
SPICE: 13.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp