HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视觉问答
Visual Question Answering On Mm Vet
Visual Question Answering On Mm Vet
评估指标
GPT-4 score
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GPT-4 score
Paper Title
Repository
gemini-2.0-flash-exp
81.2±0.4
-
-
gemini-exp-1206
78.1±0.2
-
-
Gemini 1.5 Pro (gemini-1.5-pro-002)
76.9±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
MMCTAgent (GPT-4 + GPT-4V)
74.24
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
-
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)
74.2±0.2
Claude 3.5 Sonnet Model Card Addendum
-
Qwen2-VL-72B
74.0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
InternVL2.5-78B
72.3
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
GPT-4o +text rationale +IoT
72.2
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
-
Lyra-Pro
71.4
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
GLM-4V-Plus
71.1
CogVLM2: Visual Language Models for Image and Video Understanding
Phantom-7B
70.8
Phantom of Latent for Large Language and Vision Models
GPT-4o (gpt-4o-2024-05-13)
69.3±0.1
GPT-4 Technical Report
InternVL2.5-38B
68.8
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
gpt-4o-mini-2024-07-18
68.6±0.1
GPT-4 Technical Report
GPT-4V
67.7±0.3
GPT-4 Technical Report
GPT-4V-Turbo-detail:high
67.6±0.1
GPT-4 Technical Report
Qwen-VL-Max
66.6±0.5
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Gemini 1.5 Pro (gemini-1.5-pro)
65.8±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternVL2-26B (SGP, token ratio 64%)
65.60
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Baichuan-Omni (7B)
65.4
Baichuan-Omni Technical Report
0 of 229 row(s) selected.
Previous
Next