Command Palette
Search for a command to run...
Triantafyllidou Danai ; Parisot Sarah ; Leonardis Ales ; McDonagh Steven

Abstract
Visual domain gaps often impact object detection performance. Image-to-imagetranslation can mitigate this effect, where contrastive approaches enablelearning of the image-to-image mapping under unsupervised regimes. However,existing methods often fail to handle content-rich scenes with multiple objectinstances, which manifests in unsatisfactory detection performance. Sensitivityto such instance-level content is typically only gained through objectannotations, which can be expensive to obtain. Towards addressing this issue,we present a novel image-to-image translation method that specifically targetscross-domain object detection. We formulate our approach as a contrastivelearning framework with an inductive prior that optimises the appearance ofobject instances through spatial attention masks, implicitly delineating thescene into foreground regions associated with the target object instances andbackground non-object regions. Instead of relying on object annotations toexplicitly account for object instances during translation, our approach learnsto represent objects by contrasting local-global information. This affordsinvestigation of an under-explored challenge: obtaining performant detection,under domain shifts, without relying on object annotations nor detector modelfine-tuning. We experiment with multiple cross-domain object detection settingsacross three challenging benchmarks and report state-of-the-art performance.Project page: https://local-global-detection.github.io
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-domain-adaptation-on-cityscapes-1 | LGCL (unsupervised) | mAP@0.5: 45.3 |
| unsupervised-domain-adaptation-on-cityscapes-1 | LGCL (supervised) | mAP@0.5: 46.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.