HyperAI超神经

Visual Question Answering Vqa On Vlm2 Bench

评估指标

Average Score on VLM2-bench (9 subtasks)
GC-mat
GC-trk
OC-cnt
OC-cpr
OC-grp
PC-VID
PC-cnt
PC-cpr
PC-grp

评测结果

各个模型在此基准测试上的表现结果

模型名称
Average Score on VLM2-bench (9 subtasks)
GC-mat
GC-trk
OC-cnt
OC-cpr
OC-grp
PC-VID
PC-cnt
PC-cpr
PC-grp
Paper TitleRepository
mPLUG-Owl3-7B37.8517.3718.2662.9749.1731.0013.5058.8663.5026.00mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
LLaVA-OneVision-7B39.3516.6013.7056.1747.2227.5047.2546.6762.0037.00LLaVA-OneVision: Easy Visual Task Transfer
InternVL2.5-26B45.5930.5030.5951.4843.3352.5021.7559.7059.5061.00Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
LLaVA-Video-7B43.3218.5312.7962.4754.7228.5059.0066.9162.0025.00Video Instruction Tuning With Synthetic Data-
LongVA-7B22.5914.2919.1842.5326.6718.503.7538.9021.5018.00Long Context Transfer from Language to Vision
InternVL2.5-8B41.2321.2426.0355.2353.3346.505.2560.0051.5052.00Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Qwen2-VL-7B42.3727.8019.1845.9968.0635.0016.2558.5961.5049.00Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Qwen2.5-VL-7B54.8235.9143.3841.7271.3947.5046.5057.9880.0069.00Qwen2.5-VL Technical Report
GPT-4o60.3637.4539.2780.6274.1757.5066.7590.5050.0047.00GPT-4o System Card-
0 of 9 row(s) selected.