Command Palette
Search for a command to run...
Learning Semantic-Aligned Feature Representation for Text-based Person Search
Li Shiping ; Cao Min ; Zhang Min

Abstract
Text-based person search aims to retrieve images of a certain pedestrian by atextual description. The key challenge of this task is to eliminate theinter-modality gap and achieve the feature alignment across modalities. In thispaper, we propose a semantic-aligned embedding method for text-based personsearch, in which the feature alignment across modalities is achieved byautomatically learning the semantic-aligned visual features and textualfeatures. First, we introduce two Transformer-based backbones to encode robustfeature representations of the images and texts. Second, we design asemantic-aligned feature aggregation network to adaptively select and aggregatefeatures with the same semantics into part-aware features, which is achieved bya multi-head attention module constrained by a cross-modality part alignmentloss and a diversity loss. Experimental results on the CUHK-PEDES and Flickr30Kdatasets show that our method achieves state-of-the-art performances.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| nlp-based-person-retrival-on-cuhk-pedes | SAF | R@1: 64.13 R@10: 88.4 R@5: 82.62 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.