HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Visual Question Answering
Visual Question Answering On Vcr Q A Test
Visual Question Answering On Vcr Q A Test
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
VL-BERTLARGE
75.8
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
MAD (Single Model, Formerly CLIP-TD)
79.6
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
-
UNITER (Large)
77.3
UNITER: UNiversal Image-TExt Representation Learning
GPT4RoI
89.4
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
VisualBERT
71.6
VisualBERT: A Simple and Performant Baseline for Vision and Language
ERNIE-ViL-large(ensemble of 15 models)
81.6
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
-
UNITER-large (10 ensemble)
79.8
UNITER: UNiversal Image-TExt Representation Learning
OFA-X
71.2
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
OFA-X-MT
62
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
VL-T5
75.3
Unifying Vision-and-Language Tasks via Text Generation
KVL-BERTLARGE
76.4
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
-
0 of 11 row(s) selected.
Previous
Next