HyperAIHyperAI

Visual Question Answering On Vqa V2 Test Std

Metrics

overall

Results

Performance results of various models on this benchmark

Model Name
overall
Paper TitleRepository
LXMERT72.5LXMERT: Learning Cross-Modality Encoder Representations from Transformers-
2D continuous softmax66.27Sparse and Continuous Attention Mechanisms-
VisualBERT71VisualBERT: A Simple and Performant Baseline for Vision and Language-
X2-VLM (large)81.8X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks-
Image features from bottom-up attention (adaptive K, ensemble)70.3Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge-
MCB [11, 12]62.27Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering-
Up-Down70.34Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-
Prompt Tuning78.53Prompt Tuning for Generative Multimodal Pretrained Models-
MCANed-670.9Deep Modular Co-Attention Networks for Visual Question Answering-
BEiT-384.03Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks-
VLMo81.30VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts-
VALOR78.62VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset-
BLOCK67.9BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection-
mPLUG-Huge83.62mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections-
DMN68.4Learning to Count Objects in Natural Images for Visual Question Answering-
BGN, ensemble75.92Bilinear Graph Networks for Visual Question Answering-
SimVLM80.34SimVLM: Simple Visual Language Model Pretraining with Weak Supervision-
VL-BERTLARGE72.2VL-BERT: Pre-training of Generic Visual-Linguistic Representations-
Single, w/o VLP74.16In Defense of Grid Features for Visual Question Answering-
Single, w/o VLP73.86Deep Multimodal Neural Architecture Search-
0 of 38 row(s) selected.
Visual Question Answering On Vqa V2 Test Std | SOTA | HyperAI