HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Dai Ming ; Li Jian ; Zhuang Jiedong ; Zhang Xian ; Yang Wankou

Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints

Abstract

Multi-task visual grounding involves the simultaneous execution oflocalization and segmentation in images based on textual expressions. Themajority of advanced methods predominantly focus on transformer-basedmultimodal fusion, aiming to extract robust multimodal representations.However, ambiguity between referring expression comprehension (REC) andreferring image segmentation (RIS) is error-prone, leading to inconsistenciesbetween multi-task predictions. Besides, insufficient multimodal understandingdirectly contributes to biased target perception. To overcome these challenges,we propose a Coarse-to-fine Consistency Constraints Visual Groundingarchitecture ($\text{C}^3\text{VG}$), which integrates implicit and explicitmodeling approaches within a two-stage framework. Initially, query and pixeldecoders are employed to generate preliminary detection and segmentationoutputs, a process referred to as the Rough Semantic Perception (RSP) stage.These coarse predictions are subsequently refined through the proposedMask-guided Interaction Module (MIM) and a novel explicit bidirectionalconsistency constraint loss to ensure consistent representations across tasks,which we term the Refined Consistency Interaction (RCI) stage. Furthermore, toaddress the challenge of insufficient multimodal understanding, we leveragepre-trained models based on visual-linguistic fusion representations. Empiricalevaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets demonstrate theefficacy and soundness of $\text{C}^3\text{VG}$, which significantlyoutperforms state-of-the-art REC and RIS methods by a substantial margin. Codeand model will be available at \url{https://github.com/Dmmm1997/C3VG}.

Code Repositories

dmmm1997/c3vg
Official
pytorch
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp