Visual Question Answering On Vcr Qa R Test
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
VL-BERTLARGE | 78.4 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations | |
VL-T5 | 77.8 | Unifying Vision-and-Language Tasks via Text Generation | |
UNITER-large (ensemble of 10 models) | 83.4 | UNITER: UNiversal Image-TExt Representation Learning | |
GPT4RoI | 91.0 | GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | |
UNITER (Large) | 80.8 | UNITER: UNiversal Image-TExt Representation Learning | |
ERNIE-ViL-large(ensemble of 15 models) | 86.1 | ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph | - |
KVL-BERTLARGE | 78.6 | KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning | - |
VisualBERT | 73.2 | VisualBERT: A Simple and Performant Baseline for Vision and Language |
0 of 8 row(s) selected.