Command Palette
Search for a command to run...
Zhe ChenWeiyun WangYue CaoYangzhou LiuZhangwei GaoErfei CuiJinguo ZhuShenglong YeHao TianZhaoyang LiuLixin GuXuehui WangQingyun LiYimin RenZixuan ChenJiapeng LuoJiahao WangTan JiangBo WangConghui HeBotian ShiXingcheng ZhangHan LvYi WangWenqi ShaoPei ChuZhongying TuTong HeZhiyong WuHuipeng DengJiaye GeKai ChenMin DouLewei LuXizhou ZhuTong LuDahua LinYu QiaoJifeng DaiWenhai Wang

摘要
我们推出 InternVL 2.5,这是一个先进的多模态大语言模型(MLLM)系列,基于 InternVL 2.0 进行演进,在保持其核心模型架构的基础上,显著提升了训练与测试策略以及数据质量。在本研究中,我们深入探讨了模型规模与性能之间的关系,系统性地分析了视觉编码器、语言模型、数据集规模以及测试时配置等关键因素的性能变化趋势。通过在广泛基准测试集上的大量评估,涵盖跨学科推理、文档理解、多图像/视频理解、现实世界理解、多模态幻觉检测、视觉定位、多语言能力以及纯语言处理等多个任务,InternVL 2.5 展现出具有竞争力的性能,可与 GPT-4o、Claude-3.5-Sonnet 等领先商业模型相媲美。尤为突出的是,我们的模型是首个在 MMMU 基准上突破 70% 的开源多模态大语言模型,通过引入思维链(Chain-of-Thought, CoT)推理,实现了 3.7 个百分点的提升,展现出强大的测试时扩展潜力。我们希望该模型能为开源社区带来积极贡献,推动多模态人工智能系统在开发与应用方面树立新的标准。HuggingFace 演示地址:https://huggingface.co/spaces/OpenGVLab/InternVL
代码仓库
opengvlab/internvl
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-question-answering-on-next-qa | InternVL-2.5(8B) | Accuracy: 85.5 |
| visual-question-answering-on-mm-vet | InternVL2.5-78B | GPT-4 score: 72.3 Params: 78B |
| visual-question-answering-on-mm-vet | InternVL2.5-38B | GPT-4 score: 68.8 Params: 38B |
| visual-question-answering-on-mm-vet | InternVL2.5-26B | GPT-4 score: 65.0 Params: 26B |
| visual-question-answering-on-mm-vet | InternVL2.5-2B | GPT-4 score: 60.8 Params: 2B |
| visual-question-answering-on-mm-vet | InternVL2.5-1B | GPT-4 score: 48.8 Params: 1B |
| visual-question-answering-on-mm-vet | InternVL2.5-4B | GPT-4 score: 60.6 Params: 4B |
| visual-question-answering-on-mm-vet | InternVL2.5-8B | GPT-4 score: 62.8 Params: 8B |
| visual-question-answering-vqa-on-vlm2-bench | InternVL2.5-26B | Average Score on VLM2-bench (9 subtasks): 45.59 GC-mat: 30.50 GC-trk: 30.59 OC-cnt: 51.48 OC-cpr: 43.33 OC-grp: 52.50 PC-VID: 21.75 PC-cnt: 59.70 PC-cpr: 59.50 PC-grp: 61.00 |
| visual-question-answering-vqa-on-vlm2-bench | InternVL2.5-8B | Average Score on VLM2-bench (9 subtasks): 41.23 GC-mat: 21.24 GC-trk: 26.03 OC-cnt: 55.23 OC-cpr: 53.33 OC-grp: 46.50 PC-VID: 5.25 PC-cnt: 60.00 PC-cpr: 51.50 PC-grp: 52.00 |