Visual Question Answering Vqa On Vlm2 Bench
评估指标
Average Score on VLM2-bench (9 subtasks)
GC-mat
GC-trk
OC-cnt
OC-cpr
OC-grp
PC-VID
PC-cnt
PC-cpr
PC-grp
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Average Score on VLM2-bench (9 subtasks) | GC-mat | GC-trk | OC-cnt | OC-cpr | OC-grp | PC-VID | PC-cnt | PC-cpr | PC-grp |
---|---|---|---|---|---|---|---|---|---|---|
mplug-owl3-towards-long-image-sequence | 37.85 | 17.37 | 18.26 | 62.97 | 49.17 | 31.00 | 13.50 | 58.86 | 63.50 | 26.00 |
llava-onevision-easy-visual-task-transfer | 39.35 | 16.60 | 13.70 | 56.17 | 47.22 | 27.50 | 47.25 | 46.67 | 62.00 | 37.00 |
expanding-performance-boundaries-of-open | 45.59 | 30.50 | 30.59 | 51.48 | 43.33 | 52.50 | 21.75 | 59.70 | 59.50 | 61.00 |
video-instruction-tuning-with-synthetic-data | 43.32 | 18.53 | 12.79 | 62.47 | 54.72 | 28.50 | 59.00 | 66.91 | 62.00 | 25.00 |
long-context-transfer-from-language-to-vision | 22.59 | 14.29 | 19.18 | 42.53 | 26.67 | 18.50 | 3.75 | 38.90 | 21.50 | 18.00 |
expanding-performance-boundaries-of-open | 41.23 | 21.24 | 26.03 | 55.23 | 53.33 | 46.50 | 5.25 | 60.00 | 51.50 | 52.00 |
qwen2-vl-enhancing-vision-language-model-s | 42.37 | 27.80 | 19.18 | 45.99 | 68.06 | 35.00 | 16.25 | 58.59 | 61.50 | 49.00 |
qwen2-5-vl-technical-report | 54.82 | 35.91 | 43.38 | 41.72 | 71.39 | 47.50 | 46.50 | 57.98 | 80.00 | 69.00 |
gpt-4o-system-card | 60.36 | 37.45 | 39.27 | 80.62 | 74.17 | 57.50 | 66.75 | 90.50 | 50.00 | 47.00 |