HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Learning with Noisy Correspondence for Cross-modal Matching

{Xi Peng Hua Wu Xinyan Xiao Wenbiao Ding Xiao Liu guocheng niu Zhenyu Huang}

Learning with Noisy Correspondence for Cross-modal Matching

Abstract

Cross-modal matching, which aims to establish the correspondence between two different modalities, is fundamental to a variety of tasks such as cross-modal retrieval and vision-and-language understanding. Although a huge number of cross-modal matching methods have been proposed and achieved remarkable progress in recent years, almost all of these methods implicitly assume that the multimodal training data are correctly aligned. In practice, however, such an assumption is extremely expensive even impossible to satisfy. Based on this observation, we reveal and study a latent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisy labels. Different from the traditional noisy labels which mainly refer to the errors in category labels, our noisy correspondence refers to the mismatch paired samples. To solve this new problem, we propose a novel method for learning with noisy correspondence, named Noisy Correspondence Rectifier (NCR). In brief, NCR divides the data into clean and noisy partitions based on the memorization effect of neural networks and then rectifies the correspondence via an adaptive prediction model in a co-teaching manner. To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method. The code could be accessed from www.pengxi.me .

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1NCR
Image-to-text R@1: 39.5
Image-to-text R@10: 73.5
Image-to-text R@5: 64.5
R-Sum: 355.6
Text-to-image R@1: 40.3
Text-to-image R@10: 73.2
Text-to-image R@5: 64.6
cross-modal-retrieval-with-noisy-2NCR
Image-to-text R@1: 75.0
Image-to-text R@10: 97.5
Image-to-text R@5: 93.9
R-Sum: 496.7
Text-to-image R@1: 58.3
Text-to-image R@10: 89.0
Text-to-image R@5: 83.0
cross-modal-retrieval-with-noisy-3NCR
Image-to-text R@1: 77.7
Image-to-text R@10: 98.2
Image-to-text R@5: 95.5
R-Sum: 518.5
Text-to-image R@1: 62.5
Text-to-image R@10: 95.3
Text-to-image R@5: 89.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp