Command Palette
Search for a command to run...

Abstract
The highest accuracy object detectors to date are based on a two-stageapproach popularized by R-CNN, where a classifier is applied to a sparse set ofcandidate object locations. In contrast, one-stage detectors that are appliedover a regular, dense sampling of possible object locations have the potentialto be faster and simpler, but have trailed the accuracy of two-stage detectorsthus far. In this paper, we investigate why this is the case. We discover thatthe extreme foreground-background class imbalance encountered during trainingof dense detectors is the central cause. We propose to address this classimbalance by reshaping the standard cross entropy loss such that itdown-weights the loss assigned to well-classified examples. Our novel FocalLoss focuses training on a sparse set of hard examples and prevents the vastnumber of easy negatives from overwhelming the detector during training. Toevaluate the effectiveness of our loss, we design and train a simple densedetector we call RetinaNet. Our results show that when trained with the focalloss, RetinaNet is able to match the speed of previous one-stage detectorswhile surpassing the accuracy of all existing state-of-the-art two-stagedetectors. Code is at: https://github.com/facebookresearch/Detectron.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 2d-object-detection-on-sardet-100k | RetinaNet | box mAP: 47.4 |
| dense-object-detection-on-sku-110k | RetinaNet | AP: 45.5 AP75: .389 |
| face-identification-on-trillion-pairs-dataset | F-Softmax | Accuracy: 39.80 |
| face-verification-on-trillion-pairs-dataset | F-Softmax | Accuracy: 37.14 |
| long-tail-learning-on-coco-mlt | Focal Loss(ResNet-50) | Average mAP: 49.46 |
| long-tail-learning-on-egtea | Focal loss (3D- ResNeXt101) | Average Precision: 59.09 Average Recall: 59.17 |
| long-tail-learning-on-voc-mlt | Focal Loss(ResNet-50) | Average mAP: 73.88 |
| object-counting-on-carpk | RetinaNet (2018) | MAE: 24.58 |
| object-detection-on-coco | RetinaNet (ResNet-101-FPN) | AP50: 59.1 AP75: 42.3 APL: 50.2 APM: 42.7 APS: 21.8 Hardware Burden: 4G Operations per network pass: box mAP: 39.1 |
| object-detection-on-coco | RetinaNet (ResNeXt-101-FPN) | AP50: 61.1 AP75: 44.1 APL: 51.2 APM: 44.2 APS: 24.1 Hardware Burden: 4G Operations per network pass: box mAP: 40.8 |
| object-detection-on-coco-o | RetinaNet (ResNet-50) | Average mAP: 16.6 Effective Robustness: 0.18 |
| pedestrian-detection-on-tju-ped-campus | RetinaNet | ALL (miss rate): 44.34 HO (miss rate): 71.31 R (miss rate): 34.73 R+HO (miss rate): 42.26 RS (miss rate): 82.99 |
| pedestrian-detection-on-tju-ped-traffic | RetinaNet | ALL (miss rate): 41.40 HO (miss rate): 61.60 R (miss rate): 23.89 R+HO (miss rate): 28.45 RS (miss rate): 37.92 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.