Command Palette
Search for a command to run...
Gyungin Shin; Weidi Xie; Samuel Albanie

Abstract
Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-semantic-segmentation-with-1 | ReCo+ | mIoU: 32.6 pixel accuracy: 54.1 |
| unsupervised-semantic-segmentation-with-1 | ReCo | mIoU: 26.3 pixel accuracy: 46.1 |
| unsupervised-semantic-segmentation-with-10 | ReCo | mIoU: 15.7 |
| unsupervised-semantic-segmentation-with-2 | ReCo | mIoU: 29.8 pixel accuracy: 70.6 |
| unsupervised-semantic-segmentation-with-2 | ReCo+ | mIoU: 31.9 pixel accuracy: 75.3 |
| unsupervised-semantic-segmentation-with-3 | ReCo+ | mIoU: 24.2 pixel accuracy: 83.7 |
| unsupervised-semantic-segmentation-with-3 | ReCo | mIoU: 19.3 pixel accuracy: 74.6 |
| unsupervised-semantic-segmentation-with-4 | ReCo | Mean IoU (val): 11.2 |
| unsupervised-semantic-segmentation-with-7 | ReCo | mIoU: 57.7 |
| unsupervised-semantic-segmentation-with-8 | ReCo | mIoU: 22.3 |
| unsupervised-semantic-segmentation-with-9 | ReCo | mIoU: 14.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.