Command Palette
Search for a command to run...
Fan Mingyuan ; Lai Shenqi ; Huang Junshi ; Wei Xiaoming ; Chai Zhenhua ; Luo Junfeng ; Wei Xiaolin

Abstract
BiSeNet has been proved to be a popular two-stream network for real-timesegmentation. However, its principle of adding an extra path to encode spatialinformation is time-consuming, and the backbones borrowed from pretrainedtasks, e.g., image classification, may be inefficient for image segmentationdue to the deficiency of task-specific design. To handle these problems, wepropose a novel and efficient structure named Short-Term Dense Concatenatenetwork (STDC network) by removing structure redundancy. Specifically, wegradually reduce the dimension of feature maps and use the aggregation of themfor image representation, which forms the basic module of STDC network. In thedecoder, we propose a Detail Aggregation module by integrating the learning ofspatial information into low-level layers in single-stream manner. Finally, thelow-level features and deep features are fused to predict the finalsegmentation results. Extensive experiments on Cityscapes and CamVid datasetdemonstrate the effectiveness of our method by achieving promising trade-offbetween segmentation accuracy and inference speed. On Cityscapes, we achieve71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti,which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0FPS while inferring on higher resolution images.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| dichotomous-image-segmentation-on-dis-te1 | STDC | E-measure: 0.798 HCE: 249 MAE: 0.090 S-Measure: 0.723 max F-Measure: 0.648 weighted F-measure: 0.562 |
| dichotomous-image-segmentation-on-dis-te2 | STDC | E-measure: 0.834 HCE: 556 MAE: 0.092 S-Measure: 0.759 max F-Measure: 0.720 weighted F-measure: 0.636 |
| dichotomous-image-segmentation-on-dis-te3 | STDC | E-measure: 0.855 HCE: 1081 MAE: 0.090 S-Measure: 0.771 max F-Measure: 0.745 weighted F-measure: 0.662 |
| dichotomous-image-segmentation-on-dis-te4 | STDC | E-measure: 0.841 HCE: 3819 MAE: 0.102 S-Measure: 0.762 max F-Measure: 0.731 weighted F-measure: 0.652 |
| dichotomous-image-segmentation-on-dis-vd | STDC | E-measure: 0.817 HCE: 1598 MAE: 0.103 S-Measure: 0.740 max F-Measure: 0.696 weighted F-measure: 0.613 |
| real-time-semantic-segmentation-on-cityscapes | STDC2-75 | Frame (fps): 97.0(1080Ti) mIoU: 76.8% |
| real-time-semantic-segmentation-on-cityscapes | STDC2-50 | Frame (fps): 188.6 mIoU: 73.4% |
| real-time-semantic-segmentation-on-cityscapes | STDC1-50 | Frame (fps): 250.4(1080Ti) mIoU: 71.9% |
| real-time-semantic-segmentation-on-cityscapes | STDC1-75 | Frame (fps): 126.7 mIoU: 75.3% |
| real-time-semantic-segmentation-on-cityscapes-1 | STDC1-Seg75 | Frame (fps): 126.7 mIoU: 74.5% |
| real-time-semantic-segmentation-on-cityscapes-1 | STDC2-Seg75 | Frame (fps): 97 mIoU: 77% |
| semantic-segmentation-on-bdd100k-val | STDC1 | mIoU: 52.1(45.8FPS) |
| semantic-segmentation-on-bdd100k-val | STDC2 | mIoU: 53.8(33.0FPS) |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.