HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视觉推理
Visual Reasoning On Winoground
Visual Reasoning On Winoground
评估指标
Group Score
Image Score
Text Score
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Group Score
Image Score
Text Score
Paper Title
Repository
GPT-4V (CoT, pick b/w two options)
58.75
68.75
75.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
-
GPT-4V (pick b/w two options)
39.25
46.25
69.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
-
MMICL + CoCoT
50.75
52.5
64.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V + CoCoT
44.5
49.5
58.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + CoCoT
41.5
55.25
58.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V
37.75
42.5
54.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
FIBER (EqSim)
27.5
32.00
51.5
Equivariant Similarity for Vision-Language Foundation Models
FIBER (finetuned, Flickr30k)
23.00
26.50
51.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL + CCoT
47.5
48
51
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + DDCoT
39
47.25
47.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
VQ2
30.5
42.2
47
What You See is What You Read? Improving Text-Image Alignment Evaluation
MMICL + DDCoT
36.75
45
46.75
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 16M
21.2
24.5
46.7
Measuring Progress in Fine-grained Vision-and-Language Understanding
PaLI (ft SNLI-VE + Synthetic Data)
28.75
38
46.5
What You See is What You Read? Improving Text-Image Alignment Evaluation
FIBER
22.25
25.75
46.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL (FLAN-T5-XXL)
43.00
44.99
45.50
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
METER (EqSim)
18.75
22.75
45.0
Equivariant Similarity for Vision-Language Foundation Models
PaLI (ft SNLI-VE)
28.70
41.50
45.00
What You See is What You Read? Improving Text-Image Alignment Evaluation
Gemini + DDCoT
23.75
25
45
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 4M
21.5
26.7
44.0
Measuring Progress in Fine-grained Vision-and-Language Understanding
0 of 113 row(s) selected.
Previous
Next
Visual Reasoning On Winoground | SOTA | HyperAI超神经