HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Image Classification
Image Classification On Imagenet V2
Image Classification On Imagenet V2
Metrics
Top 1 Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Top 1 Accuracy
Paper Title
Repository
ResMLP-S24/16
69.8
ResMLP: Feedforward networks for image classification with data-efficient training
ResMLP-S12/16
66.0
ResMLP: Feedforward networks for image classification with data-efficient training
Mixer-B/8-SAM
65.5
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
MAWS (ViT-6.5B)
84.0
The effectiveness of MAE pre-pretraining for billion-scale pretraining
LeViT-256
69.9
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
CAIT-M36-448
76.7
Going deeper with Image Transformers
ViT-B-36x1
73.9
Three things everyone should know about Vision Transformers
MOAT-1 (IN-22K pretraining)
78.4
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
ResNet50 (A1)
68.7
ResNet strikes back: An improved training procedure in timm
Discrete Adversarial Distillation (ViT-B, 224)
71.7
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
SwinV2-B
78.08
Swin Transformer V2: Scaling Up Capacity and Resolution
Model soups (ViT-G/14)
84.22
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViT-B/16-SAM
67.5
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
LeViT-192
68.7
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
SEER (RegNet10B)
76.2
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
MOAT-2 (IN-22K pretraining)
79.3
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
VOLO-D4
77.8
VOLO: Vision Outlooker for Visual Recognition
MOAT-3 (IN-22K pretraining)
80.6
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
SwinV2-G
84.00%
Swin Transformer V2: Scaling Up Capacity and Resolution
ResMLP-B24/8
73.4
ResMLP: Feedforward networks for image classification with data-efficient training
0 of 33 row(s) selected.
Previous
Next