HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Junjie Zhou Zheng Liu Ze Liu Shitao Xiao Yueze Wang Bo Zhao Chen Jason Zhang Defu Lian Yongping Xiong

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Abstract

Despite the rapidly growing demand for multimodal retrieval, progress in thisfield remains severely constrained by a lack of training data. In this paper,we introduce MegaPairs, a novel data synthesis method that leverages visionlanguage models (VLMs) and open-domain images, together with a massivesynthetic dataset generated from this method. Our empirical analysis shows thatMegaPairs generates high-quality data, enabling the multimodal retriever tosignificantly outperform the baseline model trained on 70times more datafrom existing datasets. Moreover, since MegaPairs solely relies on generalimage corpora and open-source VLMs, it can be easily scaled up, enablingcontinuous improvements in retrieval performance. In this stage, we producedmore than 26 million training instances and trained several models of varyingsizes using this data. These new models achieve state-of-the-art zero-shotperformance across 4 popular composed image retrieval (CIR) benchmarks and thehighest overall performance on the 36 datasets provided by MMEB. They alsodemonstrate notable performance improvements with additional downstreamfine-tuning. Our produced dataset, well-trained models, and data synthesispipeline will be made publicly available to facilitate the future developmentof this field.

Code Repositories

VectorSpaceLab/MegaPairs
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-retrieval-on-cirrMMRet-MLLM
Recall@10: 85.1
image-retrieval-on-fashion-iqMMRet-MLLM
(Recall@10+Recall@50)/2: 46.1
Recall@10: 35.6
zero-shot-composed-image-retrieval-zs-cir-onMMRet-Base (CLIP B/16)
mAP@10: 35.0
zero-shot-composed-image-retrieval-zs-cir-onMMRet-Large (CLIP L/14)
mAP@10: 40.2
zero-shot-composed-image-retrieval-zs-cir-onMMRet-MLLM
mAP@10: 43.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp