HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Visual Reasoning
Visual Reasoning On Nlvr2 Test
Visual Reasoning On Nlvr2 Test
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
CoCa
87.0
CoCa: Contrastive Captioners are Image-Text Foundation Models
UNITER (Large)
79.5
UNITER: UNiversal Image-TExt Representation Learning
SimVLM
85.15
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
VLMo
86.86
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
BLIP-129M
83.09
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
X2-VLM (large)
89.4
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2-VLM (base)
87.0
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X-VLM (base)
84.76
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
SOHO
77.32
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
LXMERT
76.2
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
ViLT-B/32
76.13
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
ALBEF (14M)
82.55
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
BEiT-3
92.58
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
XFM (base)
88.4
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
0 of 14 row(s) selected.
Previous
Next