HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

High-Order Attention Models for Visual Question Answering

Idan Schwartz; Alexander G. Schwing; Tamir Hazan

High-Order Attention Models for Visual Question Answering

Abstract

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

Code Repositories

idansc/HighOrderAtten
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-coco-visual-13-Modalities: Unary + Pairwise + Ternary (ResNet)
Percentage correct: 69.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp