HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Alkin Benedikt ; Miklautz Lukas ; Hochreiter Sepp ; Brandstetter Johannes

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained
  Representations

Abstract

We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learningboost for pre-trained MIM models. MIM-Refiner is motivated by the insight thatstrong representations within MIM models generally reside in intermediatelayers. Accordingly, MIM-Refiner leverages multiple contrastive heads that areconnected to different intermediate layers. In each head, a modified nearestneighbor objective constructs semantic clusters that capture semanticinformation which improves performance on downstream tasks, includingoff-the-shelf and fine-tuning settings. The refinement process is short and simple - yet highly effective. Within afew epochs, we refine the features of MIM models from subpar tostate-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained withdata2vec 2.0 on ImageNet-1K, sets a new state-of-the-art in linear probing(84.7%) and low-shot classification among models that are pre-trained onImageNet-1K. MIM-Refiner efficiently combines the advantages of MIM and IDobjectives and compares favorably against previous state-of-the-art SSL modelson a variety of benchmarks such as low-shot classification, long-tailedclassification, clustering and semantic segmentation.

Code Repositories

ml-jku/MIM-Refiner
Official
pytorch
Mentioned in GitHub
BenediktAlkin/vtab1k-pytorch
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-clustering-on-imagenetMIM-Refiner (D2V2-ViT-H/14)
ARI: 42.2
Accuracy: 67.3
NMI: 87.2
image-clustering-on-imagenetMIM-Refiner (MAE-ViT-H/14)
ARI: 45.5
Accuracy: 64.6
NMI: 85.3
self-supervised-image-classification-onMIM-Refiner (MAE-ViT-2B/14)
Number of Params: 1890M
Top 1 Accuracy: 84.5%
self-supervised-image-classification-onMIM-Refiner (MAE-ViT-H/14
Number of Params: 632M
Top 1 Accuracy: 83.7%
self-supervised-image-classification-onMIM-Refiner (MAE-ViT-L/16)
Number of Params: 307M
Top 1 Accuracy: 82.8%
self-supervised-image-classification-onMIM-Refiner (D2V2-ViT-H/14)
Number of Params: 632M
Top 1 Accuracy: 84.7%
self-supervised-image-classification-onMIM-Refiner (D2V2-ViT-L/16)
Number of Params: 307M
Top 1 Accuracy: 83.5%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp