Visual Question Answering On Vcr Q Ar Test
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
KVL-BERTLARGE | 60.3 | KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning | - |
VL-T5 | 58.9 | Unifying Vision-and-Language Tasks via Text Generation | |
ERNIE-ViL-large(ensemble of 15 models) | 70.5 | ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph | - |
UNITER (Large) | 62.8 | UNITER: UNiversal Image-TExt Representation Learning | |
GPT4RoI | 81.6 | GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | |
VisualBERT | 52.4 | VisualBERT: A Simple and Performant Baseline for Vision and Language | |
VL-BERTLARGE | 59.7 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations |
0 of 7 row(s) selected.