HyperAI

Image Classification On Imagenet

Metrics

Hardware Burden
Number of params
Operations per network pass
Top 1 Accuracy

Results

Performance results of various models on this benchmark

Model Name
Hardware Burden
Number of params
Operations per network pass
Top 1 Accuracy
Paper TitleRepository
Xception87G22.855952M0.838G79%Xception: Deep Learning with Depthwise Separable Convolutions
ResNet-101-40M-78.25%Deep Residual Learning for Image Recognition
CvT-13 (384 res)-20M-83%CvT: Introducing Convolutions to Vision Transformers
DenseNet-201---77.42%Densely Connected Convolutional Networks
ConViT-Ti+-10M-76.7%ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
ViT-B/16-SAM-87M-79.9%When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
PVTv2-B2-25.4M-82%PVT v2: Improved Baselines with Pyramid Vision Transformer
ConvFormer-S36 (224 res, 21K)-40M-85.4%MetaFormer Baselines for Vision
MambaVision-L2-241.5M-85.3%MambaVision: A Hybrid Mamba-Transformer Vision Backbone
FBNetV5-AC-CLS---78.4%FBNetV5: Neural Architecture Search for Multiple Tasks in One Run-
HCGNet-B-12.9M-78.5%Gated Convolutional Networks with Hybrid Connectivity for Image Classification
ConViT-S+-48M-82.2%ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
ResMLP-B24 + STD-122.6M-82.4%Spatial-Channel Token Distillation for Vision MLPs
DeiT-S with iRPE-K-22M-80.9%Rethinking and Improving Relative Position Encoding for Vision Transformer
BoTNet T6-53.9M-84%Bottleneck Transformers for Visual Recognition
ViT-B @224 (DeiT-III + AugSub)-86.6M-84.2%Masking Augmentation for Supervised Learning
MaxViT-B (224res)-120M-84.94%MaxViT: Multi-Axis Vision Transformer
BiFormer-S* (IN1k ptretrain)---84.3%BiFormer: Vision Transformer with Bi-Level Routing Attention
CeiT-T-6.4M-76.4%Incorporating Convolution Designs into Visual Transformers
ResNet-101 (AutoMix)-44.6M-80.98%AutoMix: Unveiling the Power of Mixup for Stronger Classifiers
0 of 1058 row(s) selected.