HyperAI超神经

Visual Question Answering On A Okvqa

评估指标

DA VQA Score
MC Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
DA VQA Score
MC Accuracy
Paper TitleRepository
ViLBERT - OK-VQA9.234.1ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
GPV-240.753.7Webly Supervised Concept Expansion for General Purpose Vision Models-
PromptCap59.673.2PromptCap: Prompt-Guided Task-Aware Image Captioning
A Simple Baseline for KB-VQA57.5-A Simple Baseline for Knowledge-Based Visual Question Answering-
VLC-BERT38.05-VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
PaLI-X-VPD68.280.4Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models-
LXMERT25.941.6LXMERT: Learning Cross-Modality Encoder Representations from Transformers
KRISP42.242.2KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA-
Prophet58.575.1Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
MC-CoT-71Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Pythia21.940.1Pythia v0.1: the Winning Entry to the VQA Challenge 2018
SMoLA-PaLI-X Specialist Model70.5583.75Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts-
ViLBERT25.941.5ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
ViLBERT - VQA12.042.1ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
HYDRA-56.35HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 15 row(s) selected.