Image Classification On Imagenet Real

Metrics

Accuracy

Params

Results

Performance results of various models on this benchmark

Model Name	Accuracy	Params	Paper Title	Repository
BiT-L	90.54%	928M	Big Transfer (BiT): General Visual Representation Learning	-
MAWS (ViT-6.5B)	91.1%	-	The effectiveness of MAE pre-pretraining for billion-scale pretraining	-
ResMLP-36	85.6%	45M	ResMLP: Feedforward networks for image classification with data-efficient training	-
Assemble ResNet-50	87.82%	-	Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network	-
ResMLP-B24/8 (22k)	-	-	ResMLP: Feedforward networks for image classification with data-efficient training	-
BiT-M	89.02%	-	Big Transfer (BiT): General Visual Representation Learning	-
Model soups (ViT-G/14)	91.20%	1843M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	-
CeiT-T	83.6%	-	Incorporating Convolution Designs into Visual Transformers	-
TokenLearner L/8 (24+11)	91.05%	460M	TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?	-
Meta Pseudo Labels (EfficientNet-L2)	91.02%	-	Meta Pseudo Labels	-
ViTAE-H (MAE, 512)	91.2%	644M	ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond	-
Model soups (BASIC-L)	91.03%	2440M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	-
FixResNeXt-101 32x48d	89.73%	829M	Fixing the train-test resolution discrepancy	-
LeViT-384	87.5%	-	LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	-
ViT-L @384 (DeiT III, 21k)	-	-	DeiT III: Revenge of the ViT	-
VOLO-D5	90.6%	-	VOLO: Vision Outlooker for Visual Recognition	-
ResMLP-12	84.6%	15M	ResMLP: Feedforward networks for image classification with data-efficient training	-
NASNet-A Large	87.56%	-	Learning Transferable Architectures for Scalable Image Recognition	-
Assemble-ResNet152	88.65%	-	Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network	-
DeiT-Ti	82.1%	5M	Training data-efficient image transformers & distillation through attention	-

0 of 57 row(s) selected.