Command Palette
Search for a command to run...
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye Haiyang Xu Jiabo Ye Ming Yan Anwen Hu Haowei Liu Qi Qian Ji Zhang Fei Huang Jingren Zhou

Abstract
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single generic model. Notably, mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios, setting a pioneering path in the development of future multi-modal foundation models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| long-context-understanding-on-mmneedle | mPLUG-Owl-v2 | 1 Image, 2*2 Stitching, Exact Accuracy: 1.9 1 Image, 4*4 Stitching, Exact Accuracy: 0.3 1 Image, 8*8 Stitching, Exact Accuracy: 0.7 10 Images, 1*1 Stitching, Exact Accuracy: 0.4 10 Images, 2*2 Stitching, Exact Accuracy: 0.1 10 Images, 4*4 Stitching, Exact Accuracy: 0 10 Images, 8*8 Stitching, Exact Accuracy: 0 |
| visual-question-answering-on-mm-vet | mPLUG-Owl2 | GPT-4 score: 36.3±0.1 Params: 7B |
| visual-question-answering-vqa-on-core-mm | mPLUG-Owl2 | Abductive: 20.6 Analogical: 7.64 Deductive: 23.43 Overall score: 20.05 Params: 7B |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.