Image Classification On Imagenet Real

评估指标

Accuracy

Params

评测结果

各个模型在此基准测试上的表现结果

			Paper Title	Repository
Baseline (ViT-G/14)	91.78%	-	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Model soups (ViT-G/14)	91.20%	1843M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViTAE-H (MAE, 512)	91.2%	644M	ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Meta Pseudo Labels (EfficientNet-B6-Wide)	91.12%	-	Meta Pseudo Labels
MAWS (ViT-6.5B)	91.1%	-	The effectiveness of MAE pre-pretraining for billion-scale pretraining
TokenLearner L/8 (24+11)	91.05%	460M	TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Model soups (BASIC-L)	91.03%	2440M	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Meta Pseudo Labels (EfficientNet-L2)	91.02%	-	Meta Pseudo Labels
FixEfficientNet-L2	90.9%	480M	Fixing the train-test resolution discrepancy: FixEfficientNet
MAWS (ViT-2B)	90.9%	-	The effectiveness of MAE pre-pretraining for billion-scale pretraining
ViT-G/14	90.81%	-	Scaling Vision Transformers
MAWS (ViT-H)	90.8%	-	The effectiveness of MAE pre-pretraining for billion-scale pretraining
SWAG (RegNetY 128GF)	90.7%	-	Revisiting Weakly Supervised Pre-Training of Visual Perception Models
VOLO-D5	90.6%	-	VOLO: Vision Outlooker for Visual Recognition
CvT-W24 (384 res, ImageNet-22k pretrain)	90.6%	-	CvT: Introducing Convolutions to Vision Transformers
EfficientNet-L2	90.55%	480M	Self-training with Noisy Student improves ImageNet classification
BiT-L	90.54%	928M	Big Transfer (BiT): General Visual Representation Learning
VOLO-D4	90.5%	-	VOLO: Vision Outlooker for Visual Recognition
CAIT-M36-448	90.2%	-	-	-
Mixer-H/14- 448 (JFT-300M pre-train)	90.18%	409M	MLP-Mixer: An all-MLP Architecture for Vision

0 of 57 row(s) selected.

Command Palette

Image Classification On Imagenet Real

评估指标

评测结果