HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

Dang Zhuohang ; Luo Minnan ; Jia Chengyou ; Dai Guang ; Chang Xiaojun ; Wang Jingdong

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

Abstract

Cross-modal retrieval relies on well-matched large-scale datasets that arelaborious in practice. Recently, to alleviate expensive data collection,co-occurring pairs from the Internet are automatically harvested for training.However, it inevitably includes mismatched pairs, \ie, noisy correspondences,undermining supervision reliability and degrading performance. Current methodsleverage deep neural networks' memorization effect to address noisycorrespondences, which overconfidently focus on \emph{similarity-guidedtraining with hard negatives} and suffer from self-reinforcing errors. In lightof above, we introduce a novel noisy correspondence learning framework, namely\textbf{S}elf-\textbf{R}einforcing \textbf{E}rrors \textbf{M}itigation (SREM).Specifically, by viewing sample matching as classification tasks within thebatch, we generate classification logits for the given sample. Instead of asingle similarity score, we refine sample filtration through energy uncertaintyand estimate model's sensitivity of selected clean samples using swappedclassification entropy, in view of the overall prediction distribution.Additionally, we propose cross-modal biased complementary learning to leveragenegative matches overlooked in hard-negative training, further improving modeloptimization stability and curbing self-reinforcing errors. Extensiveexperiments on challenging benchmarks affirm the efficacy and efficiency ofSREM.

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1SREM
Image-to-text R@1: 40.9
Image-to-text R@10: 77.1
Image-to-text R@5: 67.5
R-Sum: 372.2
Text-to-image R@1: 41.5
Text-to-image R@10: 77.0
Text-to-image R@5: 68.2
cross-modal-retrieval-with-noisy-2SREM
Image-to-text R@1: 79.5
Image-to-text R@10: 97.9
Image-to-text R@5: 94.2
R-Sum: 507.8
Text-to-image R@1: 61.2
Text-to-image R@10: 90.2
Text-to-image R@5: 84.8
cross-modal-retrieval-with-noisy-3SREM
Image-to-text R@1: 78.5
Image-to-text R@10: 98.8
Image-to-text R@5: 96.8
R-Sum: 524.1
Text-to-image R@1: 63.8
Text-to-image R@10: 95.8
Text-to-image R@5: 90.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp