3 months ago

Masked Autoencoders Are Scalable Vision Learners

Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross Girshick

Abstract

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pre-training and shows promising scaling behavior.

Code Repositories

ariG23498/mae-scalable-vision-learners

Mentioned in GitHub

islamosmanubc/MedMAE

pytorch

Mentioned in GitHub

keytoyze/visionts

pytorch

Mentioned in GitHub

alicebizeul/pmae

pytorch

Mentioned in GitHub

2023-MindSpore-1/ms-code-206

mindspore

xplip/pixel

pytorch

Mentioned in GitHub

qiaopTDUN/mae-repo

pytorch

Mentioned in GitHub

guilk/vlc

pytorch

Mentioned in GitHub

lightly-ai/lightly

pytorch

Mentioned in GitHub

Nullius-2020/MAE-Paddle

paddle

Mentioned in GitHub

facebookresearch/vip-mae

pytorch

Mentioned in GitHub

aHapBean/PCP-MAE

pytorch

Mentioned in GitHub

Westlake-AI/openmixup

pytorch

Mentioned in GitHub

Ugenteraan/Masked-AutoEncoder-PyTorch

pytorch

Mentioned in GitHub

2023-MindSpore-4/Code12/tree/main/MindFormers/mae

mindspore

FlyEgle/MAE-pytorch

pytorch

Mentioned in GitHub

leaplabthu/efficienttrain

pytorch

Mentioned in GitHub

zinengtang/tvlt

pytorch

Mentioned in GitHub

zhangq327/u-mae

pytorch

Mentioned in GitHub

yongyupei/papers_with_examps/tree/main/mae

mindspore

SnailDev/github-hot-hub

pytorch

Mentioned in GitHub

PatrickHua/SimpleMAE

pytorch

pengzhiliang/MAE-pytorch

pytorch

Mentioned in GitHub

yangyucheng000/mae

mindspore

dispink/xpt

pytorch

Mentioned in GitHub

alibaba/EasyCV

pytorch

virajprabhu/pacmac

pytorch

Mentioned in GitHub

BUPT-PRIV/MAE-priv

pytorch

Mentioned in GitHub

wangyz1608/knowledge-distillation-via-nd

pytorch

Mentioned in GitHub

yifanzhang-pro/m-mae

pytorch

Mentioned in GitHub

mx-mark/videotransformer-pytorch

pytorch

Mentioned in GitHub

https://gitlab.com/birder/birder

pytorch

facebookresearch/hiera

pytorch

Mentioned in GitHub

oneflow-inc/libai

Mentioned in GitHub

facebookresearch/mae

Official

pytorch

Mentioned in GitHub

innat/VideoMAE

Mentioned in GitHub

2020132075/conmae

pytorch

Mentioned in GitHub

dravenww/curated-article

Mentioned in GitHub

keras-team/keras-io/blob/master/examples/vision/masked_image_modeling.py

DarshanDeshpande/jax-models

jax

Mentioned in GitHub

bwconrad/masked-autoencoder

pytorch

Mentioned in GitHub

liujiyuan13/MAE-code

pytorch

Mentioned in GitHub

IcarusWizard/MAE

pytorch

Mentioned in GitHub

isaaccorley/hydro-foundation-model

pytorch

Mentioned in GitHub

kit-mrt/masked-fusion-360

pytorch

Mentioned in GitHub

dominickrei/limited-data-vits

pytorch

Mentioned in GitHub

wangsr126/mae-lite

pytorch

Mentioned in GitHub

nasa-impact/hls-foundation-os

pytorch

Mentioned in GitHub

three0-s/MAE-keras

yangyucheng000/MAE-2

mindspore

hkbu-vscomputing/2022_mm_dmae-mocap

pytorch

Mentioned in GitHub

yangsun22/tc-moa

pytorch

Mentioned in GitHub

lonnyzhang423/github-hot-hub

pytorch

Mentioned in GitHub

open-mmlab/mmselfsup

pytorch

Mentioned in GitHub

0jason000/mae_vit

mindspore

Mentioned in GitHub

flytocc/mae-paddle

paddle

Asthestarsfalll/MAE-MegEngine

pytorch

facebookresearch/multimodal

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
domain-generalization-on-imagenet-a	MAE (ViT-H, 448)	Top-1 accuracy %: 76.7
domain-generalization-on-imagenet-c	MAE (ViT-H)	Number of params: 632M mean Corruption Error (mCE): 33.8
domain-generalization-on-imagenet-r	MAE (ViT-H, 448)	Top-1 Error Rate: 33.5
domain-generalization-on-imagenet-sketch	MAE (ViT-H, 448)	Top-1 accuracy: 50.9
image-classification-on-imagenet	MAE (ViT-L)	Top 1 Accuracy: 85.9%
image-classification-on-imagenet	MAE (ViT-H, 448)	Number of params: 656M Top 1 Accuracy: 87.8%
image-classification-on-imagenet	MAE (ViT-L)	Top 1 Accuracy: 83.6%
image-classification-on-imagenet	MAE (ViT-H)	Top 1 Accuracy: 86.9%
image-classification-on-inaturalist	MAE (ViT-H, 448)	Top 1 Accuracy: 83.4
image-classification-on-inaturalist-2018	MAE (ViT-H, 448)	Top-1 Accuracy: 86.8%
image-classification-on-inaturalist-2019	MAE (ViT-H, 448)	Top-1 Accuracy: 88.3
image-classification-on-omnibenchmark	MAE	Average Top-1 Accuracy: 30.6
image-classification-on-places205	MAE (ViT-H, 448)	Top 1 Accuracy: 66.8
image-classification-on-places365-standard	MAE (ViT-H, 448)	Top 1 Accuracy: 60.3
object-detection-on-coco-minival	MAE (ViT-L, Mask R-CNN)	box AP: 53.3
object-detection-on-coco-minival	MAE (ViT-B, Mask R-CNN)	box AP: 50.3
self-supervised-image-classification-on	MAE (ViT-B)	Number of Params: 80M Top 1 Accuracy: 68.0%
self-supervised-image-classification-on	MAE (ViT-L)	Number of Params: 306M Top 1 Accuracy: 75.8%
self-supervised-image-classification-on	MAE (ViT-H)	Number of Params: 700M Top 1 Accuracy: 76.6%
self-supervised-image-classification-on-1	MAE (ViT-H/14)	Top 1 Accuracy: 86.9%
self-supervised-image-classification-on-1	MAE (ViT-H/14, 448)	Number of Params: 632M Top 1 Accuracy: 87.8%
semantic-segmentation-on-ade20k	MAE (ViT-B, UperNet)	Validation mIoU: 48.1
semantic-segmentation-on-ade20k	MAE (ViT-L, UperNet)	Validation mIoU: 53.6
semantic-segmentation-on-imagenet-s	MAE (ViT-B/16, 224x224, SSL+FT)	mIoU (test): 60.2 mIoU (val): 61.0
semantic-segmentation-on-imagenet-s	MAE (ViT-B/16, 224x224, SSL)	mIoU (test): 37.0 mIoU (val): 38.3
semantic-segmentation-on-imagenet-s	MAE (ViT-B/16, 224x224, SSL, mmseg)	mIoU (test): 40.3 mIoU (val): 40.0
semantic-segmentation-on-imagenet-s	MAE (ViT-B/16, 224x224, SSL+FT, mmseg)	mIoU (test): 61.2 mIoU (val): 61.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Masked Autoencoders Are Scalable Vision Learners

Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross Girshick

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters