HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
图像分类
Image Classification On Imagenet Real
Image Classification On Imagenet Real
评估指标
Accuracy
Params
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Params
Paper Title
Repository
Baseline (ViT-G/14)
91.78%
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Model soups (ViT-G/14)
91.20%
1843M
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViTAE-H (MAE, 512)
91.2%
644M
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Meta Pseudo Labels (EfficientNet-B6-Wide)
91.12%
-
Meta Pseudo Labels
MAWS (ViT-6.5B)
91.1%
-
The effectiveness of MAE pre-pretraining for billion-scale pretraining
TokenLearner L/8 (24+11)
91.05%
460M
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Model soups (BASIC-L)
91.03%
2440M
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Meta Pseudo Labels (EfficientNet-L2)
91.02%
-
Meta Pseudo Labels
FixEfficientNet-L2
90.9%
480M
Fixing the train-test resolution discrepancy: FixEfficientNet
MAWS (ViT-2B)
90.9%
-
The effectiveness of MAE pre-pretraining for billion-scale pretraining
ViT-G/14
90.81%
-
Scaling Vision Transformers
MAWS (ViT-H)
90.8%
-
The effectiveness of MAE pre-pretraining for billion-scale pretraining
SWAG (RegNetY 128GF)
90.7%
-
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
VOLO-D5
90.6%
-
VOLO: Vision Outlooker for Visual Recognition
CvT-W24 (384 res, ImageNet-22k pretrain)
90.6%
-
CvT: Introducing Convolutions to Vision Transformers
EfficientNet-L2
90.55%
480M
Self-training with Noisy Student improves ImageNet classification
BiT-L
90.54%
928M
Big Transfer (BiT): General Visual Representation Learning
VOLO-D4
90.5%
-
VOLO: Vision Outlooker for Visual Recognition
CAIT-M36-448
90.2%
-
-
-
Mixer-H/14- 448 (JFT-300M pre-train)
90.18%
409M
MLP-Mixer: An all-MLP Architecture for Vision
0 of 57 row(s) selected.
Previous
Next