Command Palette
Search for a command to run...
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
Yangtao Wang; Xi Shen; Shell Hu; Yuan Yuan; James Crowley; Dominique Vaufreydaz

Abstract
Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| single-object-discovery-on-coco-20k | TokenCut + CAD | CorLoc: 62.6 |
| single-object-discovery-on-coco-20k | TokenCut | CorLoc: 58.8 |
| unsupervised-saliency-detection-on-dut-omron | TokenCut | Accuracy: 89.7 IoU: 61.8 maximal F-measure: 69.7 |
| unsupervised-saliency-detection-on-duts | TokenCut | Accuracy: 91.4 IoU: 62.4 maximal F-measure: 75.5 |
| unsupervised-saliency-detection-on-ecssd | TokenCut | Accuracy: 93.4 IoU: 77.2 maximal F-measure: 87.4 |
| weakly-supervised-object-localization-on-2 | TokenCut | GT-known localization accuracy: 65.4 Top-1 Localization Accuracy: 52.3 |
| weakly-supervised-object-localization-on-cub | TokenCut | Top-1 Localization Accuracy: 72.9 |
| weakly-supervised-object-localization-on-cub-1 | TokenCut | Top-1 Localization Accuracy: 72.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.