HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Han Haochen ; Zheng Qinghua ; Dai Guang ; Luo Minnan ; Wang Jingdong

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Abstract

Collecting well-matched multimedia datasets is crucial for trainingcross-modal retrieval models. However, in real-world scenarios, massivemultimodal data are harvested from the Internet, which inevitably containsPartially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant datawill remarkably harm the cross-modal retrieval performance. Previous effortstend to mitigate this problem by estimating a soft correspondence todown-weight the contribution of PMPs. In this paper, we aim to address thischallenge from a new perspective: the potential semantic similarity amongunpaired samples makes it possible to excavate useful knowledge from mismatchedpairs. To achieve this, we propose L2RM, a general framework based on OptimalTransport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims togenerate refined alignments by seeking a minimal-cost transport plan acrossdifferent modalities. To formalize the rematching idea in OT, first, we proposea self-supervised cost function that automatically learns from explicitsimilarity-cost mapping relation. Second, we present to model a partial OTproblem while restricting the transport among false positives to further boostrefined alignments. Extensive experiments on three benchmarks demonstrate ourL2RM significantly improves the robustness against PMPs for existing models.The code is available at https://github.com/hhc1997/L2RM.

Code Repositories

hhc1997/l2rm
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1L2RM-SGRAF
Image-to-text R@1: 43.0
Image-to-text R@10: 75.7
Image-to-text R@5: 67.5
R-Sum: 374.2
Text-to-image R@1: 42.8
Text-to-image R@10: 77.2
Text-to-image R@5: 68.0
cross-modal-retrieval-with-noisy-2L2RM-SGRAF
Image-to-text R@1: 77.9
Image-to-text R@10: 97.8
Image-to-text R@5: 95.2
R-Sum: 503.8
Text-to-image R@1: 59.8
Text-to-image R@10: 89.5
Text-to-image R@5: 83.6
cross-modal-retrieval-with-noisy-3L2RM-SCARF
Image-to-text R@1: 80.2
Image-to-text R@10: 98.5
Image-to-text R@5: 96.3
R-Sum: 524.7
Text-to-image R@1: 64.2
Text-to-image R@10: 95.4
Text-to-image R@5: 90.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp