Command Palette
Search for a command to run...
Junho Kim; Byung-Kwan Lee; Yong Man Ro

Abstract
Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-semantic-segmentation-on | CAUSE (DINOv2, ViT-B/14) | Accuracy: 89.8 mIoU: 29.9 |
| unsupervised-semantic-segmentation-on | CAUSE (ViT-B/8) | Accuracy: 90.8 mIoU: 28.0 |
| unsupervised-semantic-segmentation-on-coco-6 | CAUSE-TR (ViT-S/8) | Pixel Accuracy: 46.6 mIoU: 15.2 |
| unsupervised-semantic-segmentation-on-coco-7 | CAUSE (ViT-B/8) | Accuracy: 74.9 mIoU: 41.9 |
| unsupervised-semantic-segmentation-on-coco-7 | CAUSE (DINOv2, ViT-B/14) | Accuracy: 78.0 mIoU: 45.3 |
| unsupervised-semantic-segmentation-on-coco-8 | CAUSE-TR (ViT-S/8) | Pixel Accuracy: 75.2 mIoU: 21.2 |
| unsupervised-semantic-segmentation-on-coco-8 | CAUSE-MLP (ViT-S/8) | Pixel Accuracy: 78.8 mIoU: 19.1 |
| unsupervised-semantic-segmentation-on-pascal-1 | CAUSE (ViT-B/8) | Clustering [mIoU]: 53.3 |
| unsupervised-semantic-segmentation-on-pascal-1 | CAUSE (iBOT, ViT-B/16) | Clustering [mIoU]: 53.4 |
| unsupervised-semantic-segmentation-on-pascal-1 | CAUSE (DINOv2, ViT-B/14) | Clustering [mIoU]: 53.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.