Command Palette
Search for a command to run...
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

Abstract
To effectively exploit the potential of large-scale models, variouspre-training strategies supported by massive data from different sources areproposed, including supervised pre-training, weakly-supervised pre-training,and self-supervised pre-training. It has been proved that combining multiplepre-training strategies and data from various modalities/sources can greatlyboost the training of large-scale models. However, current works adopt amulti-stage pre-training system, where the complex pipeline may increase theuncertainty and instability of the pre-training. It is thus desirable thatthese strategies can be integrated in a single-stage manner. In this paper, wefirst propose a general multi-modal mutual information formula as a unifiedoptimization target and demonstrate that all existing approaches are specialcases of our framework. Under this unified perspective, we propose anall-in-one single-stage pre-training approach, named Maximizing Multi-modalMutual Information Pre-training (M3I Pre-training). Our approach achievesbetter performance than previous pre-training methods on various visionbenchmarks, including ImageNet classification, COCO object detection, LVISlong-tailed object detection, and ADE20k semantic segmentation. Notably, wesuccessfully pre-train a billion-level parameter image backbone and achievestate-of-the-art performance on various benchmarks. Code shall be released athttps://github.com/OpenGVLab/M3I-Pretraining.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-imagenet | M3I Pre-training (InternImage-H) | Top 1 Accuracy: 89.6% |
| object-detection-on-coco | M3I Pre-training (InternImage-H) | box mAP: 65.4 |
| object-detection-on-coco-minival | M3I Pre-training (InternImage-H) | box AP: 65.0 |
| object-detection-on-lvis-v1-0-minival | M3I Pre-training (InternImage-H, single-scale) | box AP: 65.8 |
| semantic-segmentation-on-ade20k | M3I Pre-training (InternImage-H) | Params (M): 1310 Validation mIoU: 62.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.