5 months ago

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Li Shiping ; Cao Min ; Zhang Min

Abstract

Text-based person search aims to retrieve images of a certain pedestrian by atextual description. The key challenge of this task is to eliminate theinter-modality gap and achieve the feature alignment across modalities. In thispaper, we propose a semantic-aligned embedding method for text-based personsearch, in which the feature alignment across modalities is achieved byautomatically learning the semantic-aligned visual features and textualfeatures. First, we introduce two Transformer-based backbones to encode robustfeature representations of the images and texts. Second, we design asemantic-aligned feature aggregation network to adaptively select and aggregatefeatures with the same semantics into part-aware features, which is achieved bya multi-head attention module constrained by a cross-modality part alignmentloss and a diversity loss. Experimental results on the CUHK-PEDES and Flickr30Kdatasets show that our method achieves state-of-the-art performances.

Code Repositories

reallsp/SAF

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
nlp-based-person-retrival-on-cuhk-pedes	SAF	R@1: 64.13 R@10: 88.4 R@5: 82.62

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette