Visual Question Answering On Gqa Test Std
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
LXMERT | 60.3 | LXMERT: Learning Cross-Modality Encoder Representations from Transformers | |
CNN+LSTM | 46.55 | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering | |
single-hop + LCGN (ours) | 56.1 | Language-Conditioned Graph Networks for Relational Reasoning | |
ProTo | 65.14 | ProTo: Program-Guided Transformer for Program-Guided Tasks | |
MDETR-ENB5 | 62.45 | MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | |
NSM | 63.17 | Learning by Abstraction: The Neural State Machine | |
MAC | 54.06 | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering |
0 of 7 row(s) selected.