HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

Yang Shuyu ; Zhou Yinan ; Wang Yaxiong ; Wu Yujiao ; Zhu Li ; Zheng Zhedong

Towards Unified Text-based Person Retrieval: A Large-scale
  Multi-Attribute and Language Search Benchmark

Abstract

In this paper, we introduce a large Multi-Attribute and Language Searchdataset for text-based person retrieval, called MALS, and explore thefeasibility of performing pre-training on both attribute recognition andimage-text matching tasks in one stone. In particular, MALS contains 1,510,330image-text pairs, which is about 37.5 times larger than prevailing CUHK-PEDES,and all images are annotated with 27 attributes. Considering the privacyconcerns and annotation costs, we leverage the off-the-shelf diffusion modelsto generate the dataset. To verify the feasibility of learning from thegenerated data, we develop a new joint Attribute Prompt Learning and TextMatching Learning (APTM) framework, considering the shared knowledge betweenattribute and text. As the name implies, APTM contains an attribute promptlearning stream and a text matching learning stream. (1) The attribute promptlearning leverages the attribute prompts for image-attribute alignment, whichenhances the text matching learning. (2) The text matching learning facilitatesthe representation learning on fine-grained details, and in turn, boosts theattribute prompt learning. Extensive experiments validate the effectiveness ofthe pre-training on MALS, achieving state-of-the-art retrieval performance viaAPTM on three challenging real-world benchmarks. In particular, APTM achieves aconsistent improvement of +6.96%, +7.68%, and +16.95% Recall@1 accuracy onCUHK-PEDES, ICFG-PEDES, and RSTPReid datasets by a clear margin, respectively.

Code Repositories

Shuyu-XJTU/APTM
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
nlp-based-person-retrival-on-cuhk-pedesAPTM
R@1: 76.53
R@10: 94.15
R@5: 90.04
mAP: 66.91
pedestrian-attribute-recognition-on-pa-100kAPTM
Accuracy: 80.17
text-based-person-retrieval-on-icfg-pedesAPTM
R@1: 68.51
mAP: 41.22
text-based-person-retrieval-on-rstpreid-1APTM
R@1: 67.50
R@10: 91.45
R@5: 85.70

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp