HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation

Sayan Nag; Koustava Goswami; Srikrishna Karanam

SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation

Abstract

Referring Expression Segmentation (RES) aims to provide a segmentation mask of the target object in an image referred to by the text (i.e., referring expression). Existing methods require large-scale mask annotations. Moreover, such approaches do not generalize well to unseen/zero-shot scenarios. To address the aforementioned issues, we propose a weakly-supervised bootstrapping architecture for RES with several new algorithmic innovations. To the best of our knowledge, ours is the first approach that considers only a fraction of both mask and box annotations (shown in Figure 1 and Table 1) for training. To enable principled training of models in such low-annotation settings, improve image-text region-level alignment, and further enhance spatial localization of the target object in the image, we propose Cross-modal Fusion with Attention Consistency module. For automatic pseudo-labeling of unlabeled samples, we introduce a novel Mask Validity Filtering routine based on a spatially aware zero-shot proposal scoring approach. Extensive experiments show that with just 30% annotations, our model SafaRi achieves 59.31 and 48.26 mIoUs as compared to 58.93 and 48.19 mIoUs obtained by the fully-supervised SOTA method SeqTR respectively on RefCOCO+@testA and RefCOCO+testB datasets. SafaRi also outperforms SeqTR by 11.7% (on RefCOCO+testA) and 19.6% (on RefCOCO+testB) in a fully-supervised setting and demonstrates strong generalization capabilities in unseen/zero-shot tasks.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-davisSafaRi-B
Ju0026F 1st frame: 61.3
Zero-Shot Transfer: true
referring-expression-segmentation-on-refcocoSafaRi-B
Overall IoU: 77.21
referring-expression-segmentation-on-refcoco-3SafaRi-B
Overall IoU: 70.78
referring-expression-segmentation-on-refcoco-4SafaRi-B
Overall IoU: 74.53
referring-expression-segmentation-on-refcoco-5SafaRi-B
Overall IoU: 64.88
referring-expression-segmentation-on-refcoco-8SafaRi
Overall IoU: 77.83
referring-expression-segmentation-on-refcoco-9SafaRi
Overall IoU: 70.71
referring-expression-segmentation-on-refcocogSafaRi-B
Overall IoU: 70.48
referring-expression-segmentation-on-refcocog-1SafaRi-B
Overall IoU: 71.06

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp