HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Jiang Ding ; Ye Mang

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image
  Person Retrieval

Abstract

Text-to-image person retrieval aims to identify the target person based on agiven textual description query. The primary challenge is to learn the mappingof visual and textual modalities into a common latent space. Prior works haveattempted to address this challenge by leveraging separately pre-trainedunimodal models to extract visual and textual features. However, theseapproaches lack the necessary underlying alignment capabilities required tomatch multimodal data effectively. Besides, these works use prior informationto explore explicit part alignments, which may lead to the distortion ofintra-modality information. To alleviate these issues, we present IRRA: across-modal Implicit Relation Reasoning and Aligning framework that learnsrelations between local visual-textual tokens and enhances global image-textmatching without requiring additional prior supervision. Specifically, we firstdesign an Implicit Relation Reasoning module in a masked language modelingparadigm. This achieves cross-modal interaction by integrating the visual cuesinto the textual tokens with a cross-modal multimodal interaction encoder.Secondly, to globally align the visual and textual embeddings, SimilarityDistribution Matching is proposed to minimize the KL divergence betweenimage-text similarity distributions and the normalized label matchingdistributions. The proposed method achieves new state-of-the-art results on allthree public datasets, with a notable margin of about 3%-9% for Rank-1 accuracycompared to prior methods.

Code Repositories

anosorae/irra
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
nlp-based-person-retrival-on-cuhk-pedesIRRA
R@1: 73.38
R@10: 93.71
R@5: 89.93
mAP: 66.13
mINP: 50.24
text-based-person-retrieval-on-icfg-pedesIRRA
R@1: 63.46
R@10: 85.82
R@5: 80.25
mAP: 38.06
mINP: 7.93
text-based-person-retrieval-on-rstpreid-1IRRA
R@1: 60.20
R@10: 81.30
R@5: 88.20
text-based-person-retrieval-with-noisyIRRA
Rank 10: 92.20
Rank-1: 69.74
Rank-5: 87.09
mAP: 62.28
mINP: 45.84
text-based-person-retrieval-with-noisy-1IRRA
Rank 1: 60.76
Rank-10: 84.01
Rank-5: 78.26
mAP: 35.87
mINP: 6.80
text-based-person-retrieval-with-noisy-2IRRA
Rank 1: 58.75
Rank 10: 88.25
Rank 5: 81.90
mAP: 46.38
mINP: 24.78

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp