Visual Question Answering On Mm Vet V2
评估指标
GPT-4 score
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | GPT-4 score |
---|---|
模型 1 | 45.5±0.1 |
模型 2 | 68.4±0.3 |
claude-3-5-sonnet-model-card-addendum | 71.8±0.2 |
cogvlm-visual-expert-for-pretrained-language | 45.1±0.2 |
模型 5 | 50.9±0.1 |
qwen-vl-a-frontier-large-vision-language | 55.8±0.2 |
generative-multimodal-models-are-in-context | 38.0±0.1 |
mimic-it-multi-modal-in-context-instruction | 23.2±0.1 |
模型 9 | 77.1±0.1 |
模型 10 | 63.8±0.2 |
gemini-a-family-of-highly-capable-multimodal-1 | 57.2±0.2 |
openflamingo-an-open-source-framework-for | 17.6±0.2 |
gpt-4-technical-report-1 | 72.1±0.2 |
improved-baselines-with-visual-instruction | 33.2±0.1 |
模型 15 | 55.8±0.2 |
how-far-are-we-to-gpt-4v-closing-the-gap-to | 51.5±0.2 |
internlm-xcomposer2-mastering-free-form-text | 42.5±0.3 |
gpt-4-technical-report-1 | 71.0±0.2 |
improved-baselines-with-visual-instruction | 28.3±0.2 |
qwen2-vl-enhancing-vision-language-model-s | 66.9±0.3 |
gpt-4-technical-report-1 | 66.8±0.3 |
cogagent-a-visual-language-model-for-gui | 34.7±0.2 |
gpt-4-technical-report-1 | 66.3±0.2 |
gemini-1-5-unlocking-multimodal-understanding | 66.9±0.2 |