HyperAI

Image Classification On Imagenet V2

Metrics

Top 1 Accuracy

Results

Performance results of various models on this benchmark

Model Name
Top 1 Accuracy
Paper TitleRepository
ResMLP-S24/1669.8ResMLP: Feedforward networks for image classification with data-efficient training
ResMLP-S12/1666.0ResMLP: Feedforward networks for image classification with data-efficient training
Mixer-B/8-SAM65.5When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
MAWS (ViT-6.5B)84.0The effectiveness of MAE pre-pretraining for billion-scale pretraining
LeViT-25669.9LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
CAIT-M36-44876.7Going deeper with Image Transformers
ViT-B-36x173.9Three things everyone should know about Vision Transformers
MOAT-1 (IN-22K pretraining)78.4MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
ResNet50 (A1)68.7ResNet strikes back: An improved training procedure in timm
Discrete Adversarial Distillation (ViT-B, 224)71.7Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
SwinV2-B78.08Swin Transformer V2: Scaling Up Capacity and Resolution
Model soups (ViT-G/14)84.22Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViT-B/16-SAM67.5When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
LeViT-19268.7LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
SEER (RegNet10B)76.2Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
MOAT-2 (IN-22K pretraining)79.3MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
VOLO-D477.8VOLO: Vision Outlooker for Visual Recognition
MOAT-3 (IN-22K pretraining)80.6MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
SwinV2-G84.00%Swin Transformer V2: Scaling Up Capacity and Resolution
ResMLP-B24/873.4ResMLP: Feedforward networks for image classification with data-efficient training
0 of 33 row(s) selected.