HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zhao Zihua ; Chen Mengxi ; Dai Tianjie ; Yao Jiangchao ; han Bo ; Zhang Ya ; Wang Yanfeng

Mitigating Noisy Correspondence by Geometrical Structure Consistency
  Learning

Abstract

Noisy correspondence that refers to mismatches in cross-modal data pairs, isprevalent on human-annotated or web-crawled datasets. Prior approaches toleverage such data mainly consider the application of uni-modal noisy labellearning without amending the impact on both cross-modal and intra-modalgeometrical structures in multimodal learning. Actually, we find that bothstructures are effective to discriminate noisy correspondence throughstructural differences when being well-established. Inspired by thisobservation, we introduce a Geometrical Structure Consistency (GSC) method toinfer the true correspondence. Specifically, GSC ensures the preservation ofgeometrical structures within and between modalities, allowing for the accuratediscrimination of noisy samples based on structural differences. Utilizingthese inferred true correspondence labels, GSC refines the learning ofgeometrical structures by filtering out the noisy samples. Experiments acrossfour cross-modal datasets confirm that GSC effectively identifies noisy samplesand significantly outperforms the current leading methods.

Code Repositories

MediaBrain-SJTU/GSC
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1GSC-SGR
Image-to-text R@1: 42.1
Image-to-text R@10: 77.7
Image-to-text R@5: 68.4
R-Sum: 375.1
Text-to-image R@1: 42.2
Text-to-image R@10: 77.1
Text-to-image R@5: 67.6
cross-modal-retrieval-with-noisy-2GSC-SGR
Image-to-text R@1: 78.3
Image-to-text R@10: 97.8
Image-to-text R@5: 94.6
R-Sum: 505.8
Text-to-image R@1: 60.1
Text-to-image R@10: 90.5
Text-to-image R@5: 84.5
cross-modal-retrieval-with-noisy-3GSC-SGR
Image-to-text R@1: 79.5
Image-to-text R@10: 98.9
Image-to-text R@5: 96.4
R-Sum: 525.7
Text-to-image R@1: 64.4
Text-to-image R@10: 95.9
Text-to-image R@5: 90.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp