Command Palette
Search for a command to run...
Yin Cui; Menglin Jia; Tsung-Yi Lin; Yang Song; Serge Belongie

Abstract
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-β^{n})/(1-β)$, where $n$ is the number of samples and $β\in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-inaturalist-2018 | ResNet-152 | Top-1 Accuracy: 69.05% |
| image-classification-on-inaturalist-2018 | ResNet-101 | Top-1 Accuracy: 67.98% |
| image-classification-on-inaturalist-2018 | ResNet-50 | Top-1 Accuracy: 64.16% |
| long-tail-learning-on-cifar-10-lt-r-10 | Class-balanced Focal Loss | Error Rate: 12.90 |
| long-tail-learning-on-cifar-10-lt-r-10 | Class-balanced Reweighting | Error Rate: 13.46 |
| long-tail-learning-on-cifar-100-lt-r-100 | Cross-Entropy (CE) | Error Rate: 61.68 |
| long-tail-learning-on-coco-mlt | CB Loss(ResNet-50) | Average mAP: 49.06 |
| long-tail-learning-on-egtea | CB Loss | Average Precision: 63.39 Average Recall: 63.26 |
| long-tail-learning-on-voc-mlt | CB Focal(ResNet-50) | Average mAP: 75.24 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.