Visual Question Answering On Gqa Test Std
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Model Name | Accuracy | Paper Title | Repository |
---|---|---|---|
LXMERT | 60.3 | LXMERT: Learning Cross-Modality Encoder Representations from Transformers | - |
CNN+LSTM | 46.55 | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering | - |
single-hop + LCGN (ours) | 56.1 | Language-Conditioned Graph Networks for Relational Reasoning | - |
ProTo | 65.14 | ProTo: Program-Guided Transformer for Program-Guided Tasks | - |
MDETR-ENB5 | 62.45 | MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | - |
NSM | 63.17 | Learning by Abstraction: The Neural State Machine | - |
MAC | 54.06 | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering | - |
0 of 7 row(s) selected.