Command Palette
Search for a command to run...
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Abstract
Modern hierarchical vision transformers have added several vision-specificcomponents in the pursuit of supervised classification performance. While thesecomponents lead to effective accuracies and attractive FLOP counts, the addedcomplexity actually makes these transformers slower than their vanilla ViTcounterparts. In this paper, we argue that this additional bulk is unnecessary.By pretraining with a strong visual pretext task (MAE), we can strip out allthe bells-and-whistles from a state-of-the-art multi-stage vision transformerwithout losing accuracy. In the process, we create Hiera, an extremely simplehierarchical vision transformer that is more accurate than previous modelswhile being significantly faster both at inference and during training. Weevaluate Hiera on a variety of tasks for image and video recognition. Our codeand models are available at https://github.com/facebookresearch/hiera.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-classification-on-kinetics-400 | Hiera-H (no extra data) | Acc@1: 87.8 |
| action-classification-on-kinetics-600 | Hiera-H (no extra data) | Top-1 Accuracy: 88.8 |
| action-classification-on-kinetics-700 | Hiera-H (no extra data) | Top-1 Accuracy: 81.1 |
| action-recognition-in-videos-on-something | Hiera-L (no extra data) | Top-1 Accuracy: 76.5 |
| action-recognition-on-ava-v2-2 | Hiera-H (K700 PT+FT) | mAP: 43.3 |
| image-classification-on-imagenet | Hiera-H | Top 1 Accuracy: 86.9% |
| image-classification-on-inaturalist | Hiera-H (448px) | Top 1 Accuracy: 83.8 |
| image-classification-on-inaturalist-2018 | Hiera-H (448px) | Top-1 Accuracy: 87.3% |
| image-classification-on-inaturalist-2019 | Hiera-H (448px) | Top-1 Accuracy: 88.5 |
| image-classification-on-places365-standard | Hiera-H (448px) | Top 1 Accuracy: 60.6 |
| instance-segmentation-on-coco-minival | Heira-L | mask AP: 48.6 |
| object-detection-on-coco-minival | Hiera-L | box AP: 55 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.