HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视觉问答
Visual Question Answering On Mm Vet V2
Visual Question Answering On Mm Vet V2
评估指标
GPT-4 score
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GPT-4 score
Paper Title
Repository
gemini-2.0-flash-exp
77.1±0.1
-
-
GPT-4o (gpt-4o-2024-11-20)
72.1±0.2
GPT-4 Technical Report
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)
71.8±0.2
Claude 3.5 Sonnet Model Card Addendum
-
GPT-4o (gpt-4o-2024-05-13)
71.0±0.2
GPT-4 Technical Report
InternVL2-Llama3-76B
68.4±0.3
-
-
Qwen2-VL-72B (qwen-vl-max-0809)
66.9±0.3
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Gemini 1.5 Pro
66.9±0.2
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
gpt-4o-mini-2024-07-18
66.8±0.3
GPT-4 Technical Report
GPT-4 Turbo (gpt-4-0125-preview)
66.3±0.2
GPT-4 Technical Report
InternVL2-40B
63.8±0.2
-
-
Gemini Pro Vision
57.2±0.2
Gemini: A Family of Highly Capable Multimodal Models
Qwen-VL-Max
55.8±0.2
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Claude 3 Opus (claude-3-opus-20240229)
55.8±0.2
-
-
InternVL-Chat-V1-5
51.5±0.2
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
LLaVA-NeXT-34B
50.9±0.1
-
-
InternVL-Chat-V1-2
45.5±0.1
-
-
CogVLM-Chat
45.1±0.2
CogVLM: Visual Expert for Pretrained Language Models
IXC2-VL-7B
42.5±0.3
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Emu2-Chat
38.0±0.1
Generative Multimodal Models are In-Context Learners
CogAgent-Chat
34.7±0.2
CogAgent: A Visual Language Model for GUI Agents
0 of 24 row(s) selected.
Previous
Next
Visual Question Answering On Mm Vet V2 | SOTA | HyperAI超神经