HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

He Shuting ; Luo Hao ; Jiang Wei ; Jiang Xudong ; Ding Henghui

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Abstract

Text-based Person Search (TBPS) aims to retrieve images of target pedestrianindicated by textual descriptions. It is essential for TBPS to extractfine-grained local features and align them crossing modality. Existing methodsutilize external tools or heavy cross-modal interaction to achieve explicitalignment of cross-modal fine-grained features, which is inefficient andtime-consuming. In this work, we propose a Vision-Guided Semantic-Group Network(VGSG) for text-based person search to extract well-aligned fine-grained visualand textual features. In the proposed VGSG, we develop a Semantic-Group TextualLearning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module toextract textual local features under the guidance of visual local clues. InSGTL, in order to obtain the local textual representation, we group textualfeatures from the channel dimension based on the semantic cues of languageexpression, which encourages similar semantic patterns to be grouped implicitlywithout external tools. In VGKT, a vision-guided attention is employed toextract visual-related textual features, which are inherently aligned withvisual cues and termed vision-guided textual features. Furthermore, we design arelational knowledge transfer, including a vision-language similarity transferand a class probability transfer, to adaptively propagate information of thevision-guided textual features to semantic-group textual features. With thehelp of relational knowledge transfer, VGKT is capable of aligningsemantic-group textual features with corresponding visual features withoutexternal tools and complex pairwise interaction. Experimental results on twochallenging benchmarks demonstrate its superiority over state-of-the-artmethods.

Benchmarks

BenchmarkMethodologyMetrics
nlp-based-person-retrival-on-cuhk-pedesVGSG (ViT-Base)
R@1: 71.38
R@10: 91.86
R@5: 86.75
mAP: 67.91
text-based-person-retrieval-on-icfg-pedesVGSG (ViT-Base)
R@1: 63.05
R@10: 84.36
R@5: 78.43

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp