Command Palette
Search for a command to run...
Qishuai Diao Yi Jiang Bin Wen Jia Sun Zehuan Yuan

Abstract
Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category. Recent state-of-the-art methods usually design sophisticated learning pipelines to tackle this task. However, visual information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Nowadays, the meta-information (e.g., spatio-temporal prior, attribute, and text description) usually appears along with the images. This inspires us to ask the question: Is it possible to use a unified and simple framework to utilize various meta-information to assist in fine-grained identification? To answer this problem, we explore a unified and strong meta-framework(MetaFormer) for fine-grained visual classification. In practice, MetaFormer provides a simple yet effective approach to address the joint learning of vision and various meta-information. Moreover, MetaFormer also provides a strong baseline for FGVC without bells and whistles. Extensive experiments demonstrate that MetaFormer can effectively use various meta-information to improve the performance of fine-grained recognition. In a fair comparison, MetaFormer can outperform the current SotA approaches with only vision information on the iNaturalist2017 and iNaturalist2018 datasets. Adding meta-information, MetaFormer can exceed the current SotA approaches by 5.9% and 5.3%, respectively. Moreover, MetaFormer can achieve 92.3% and 92.7% on CUB-200-2011 and NABirds, which significantly outperforms the SotA approaches. The source code and pre-trained models are released athttps://github.com/dqshuai/MetaFormer.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| fine-grained-image-classification-on-cub-200 | MetaFormer (MetaFormer-2,384) | Accuracy: 92.9% |
| fine-grained-image-classification-on-nabirds | MetaFormer (MetaFormer-2,384) | Accuracy: 93.0% |
| image-classification-on-inaturalist | MetaFormer (MetaFormer-2,384,extra_info) | Top 1 Accuracy: 83.4% |
| image-classification-on-inaturalist | MetaFormer (MetaFormer-2,384) | Top 1 Accuracy: 80.4% |
| image-classification-on-inaturalist-2018 | MetaFormer (MetaFormer-2,384) | Top-1 Accuracy: 84.3% |
| image-classification-on-inaturalist-2018 | MetaFormer (MetaFormer-2,384,extra_info) | Top-1 Accuracy: 88.7% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.