HyperAI超神经

首页算力平台文档资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Visual Question Answering On Vip Bench

评估指标

GPT-4 score (bbox)

GPT-4 score (human)

评测结果

各个模型在此基准测试上的表现结果

			Paper Title	Repository
GPT-4V-turbo-detail:high (Visual Prompt)	60.7	59.9	GPT-4 Technical Report
GPT-4V-turbo-detail:low (Visual Prompt)	52.8	51.4	GPT-4 Technical Report
LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt	50.5	49.0	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
ViP-LLaVA-13B (Visual Prompt)	48.3	48.2	Making Large Language Models Better Data Creators
LLaVA-1.5-13B (Coordinates)	47.1	-	Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Coordinates)	45.3	-	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt	45.1	48.2	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
LLaVA-1.5-13B (Visual Prompt)	41.8	42.9	Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Visual Prompt)	39.2	41.7	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
InstructBLIP-13B (Visual Prompt)	35.8	35.2	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
GPT4ROI 7B (ROI)	35.1	-	GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shikra-7B (Coordinates)	33.7	-	Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Kosmos-2 (Discrete Token)	26.9	-	Kosmos-2: Grounding Multimodal Large Language Models to the World

0 of 13 row(s) selected.