3 months ago

UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching

{Jianjia Cao Nannan Wang Xing Xu Yiu-ming Cheung Xin Liu Quanxing Zha}

Abstract

Cross-modal matching has recently gained significant popularity to facilitate retrieval across multi-modal data, and existing works are highly relied on an implicit assumption that the training data pairs are perfectly aligned. However, such an ideal assumption is extremely impossible due to the inevitably mismatched data pairs, a.k.a. noisy correspondence, which can wrongly enforce the mismatched data to be similar and thus induces the performance degradation. Although some recent methods have attempted to address this problem, they still face two challenging issues: 1) un- reliable data division for training inefficiency and 2) unstable pre- diction for matching failure. To address these problems, we pro- pose an efficient Uncertainty-Guided Noisy Correspondence Learning (UGNCL) framework to achieve noise-robust cross-modal matching. Specifically, a novel Uncertainty Guided Division (UGD) algorithm is reliably designed leverage the potential benefits of derived un- certainty to divide the data into clean, noisy and hard partitions, which can effortlessly mitigate the impact of easily-determined noisy pairs. Meanwhile, an efficient Trusted Robust Loss (TRL) is explicitly designed to recast the soft margins, calibrated by confi- dent yet error soft correspondence labels, for the data pairs in the hard partition through the uncertainty, leading to increase/decrease the importance of matched/mismatched pairs and further alleviate the impact of noisy pairs for robustness improvement. Extensive experiments conducted on three public datasets highlight the su- periorities of the proposed framework, and show its competitive performance compared with the state-of-the-arts. The code is avail- able at https://github.com/qxzha/UGNCL.

Benchmarks

Benchmark	Methodology	Metrics
cross-modal-retrieval-with-noisy-1	UGNCL	Image-to-text R@1: 43.6 Image-to-text R@10: 74.9 Image-to-text R@5: 67.1 R-Sum: 373.1 Text-to-image R@1: 42.7 Text-to-image R@10: 76.4 Text-to-image R@5: 68.4
cross-modal-retrieval-with-noisy-2	UGNCL	Image-to-text R@1: 78.4 Image-to-text R@10: 97.8 Image-to-text R@5: 95.8 R-Sum: 505.6 Text-to-image R@1: 59.8 Text-to-image R@10: 89.5 Text-to-image R@5: 84.3
cross-modal-retrieval-with-noisy-3	UGNCL	Image-to-text R@1: 79.5 Image-to-text R@10: 99.0 Image-to-text R@5: 97.2 R-Sum: 526.3 Text-to-image R@1: 63.7 Text-to-image R@10: 96.0 Text-to-image R@5: 90.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning