Visual Question Answering On Coco Visual 1
评估指标
Percentage correct
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Percentage correct | Paper Title | Repository |
---|---|---|---|
MCB 7 att. | 70.1 | Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | |
3-Modalities: Unary + Pairwise + Ternary (ResNet) | 69.3 | High-Order Attention Models for Visual Question Answering | |
Dual-MFA | 70.04 | Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering | |
HQI+ResNet | 66.1 | Hierarchical Question-Image Co-Attention for Visual Question Answering | |
MRN | 66.3 | Multimodal Residual Learning for Visual QA | |
iBOWIMG baseline | 62.0 | Simple Baseline for Visual Question Answering | |
RelAtt | 69.60 | R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering | |
joint-loss | 67.3 | Training Recurrent Answering Units with Joint Loss Minimization for VQA | - |
LSTM Q+I | 63.1 | VQA: Visual Question Answering | |
FDA | 64.2 | A Focused Dynamic Attention Model for Visual Question Answering | - |
0 of 10 row(s) selected.