HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
视觉问答
Visual Question Answering On Vip Bench
Visual Question Answering On Vip Bench
评估指标
GPT-4 score (bbox)
GPT-4 score (human)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GPT-4 score (bbox)
GPT-4 score (human)
Paper Title
Repository
GPT-4V-turbo-detail:high (Visual Prompt)
60.7
59.9
GPT-4 Technical Report
GPT-4V-turbo-detail:low (Visual Prompt)
52.8
51.4
GPT-4 Technical Report
LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt
50.5
49.0
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
ViP-LLaVA-13B (Visual Prompt)
48.3
48.2
Making Large Language Models Better Data Creators
LLaVA-1.5-13B (Coordinates)
47.1
-
Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Coordinates)
45.3
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt
45.1
48.2
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
LLaVA-1.5-13B (Visual Prompt)
41.8
42.9
Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Visual Prompt)
39.2
41.7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
InstructBLIP-13B (Visual Prompt)
35.8
35.2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
GPT4ROI 7B (ROI)
35.1
-
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shikra-7B (Coordinates)
33.7
-
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Kosmos-2 (Discrete Token)
26.9
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
0 of 13 row(s) selected.
Previous
Next