HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu Wenwei Zhang Lumin Xu Sheng Jin Wentao Liu Chen Change Loy

CLIM: Contrastive Language-Image Mosaic for Region Representation

Abstract

Detecting objects accurately from a large or open vocabulary necessitates the vision-language alignment on region representations. However, learning such a region-text alignment by obtaining high-quality box annotations with text labels or descriptions is expensive and infeasible. In contrast, collecting image-text pairs is simpler but lacks precise object location information to associate regions with texts. In this paper, we propose a novel approach called Contrastive Language-Image Mosaic (CLIM), which leverages large-scale image-text pairs effectively for aligning region and text representations. CLIM combines multiple images into a mosaicked image and treats each image as a `pseudo region'. The feature of each pseudo region is extracted and trained to be similar to the corresponding text embedding while dissimilar from others by a contrastive loss, enabling the model to learn the region-text alignment without costly box annotations. As a generally applicable approach, CLIM consistently improves different open-vocabulary object detection methods that use caption supervision. Furthermore, CLIM can effectively enhance the region representation of vision-language models, thus providing stronger backbones for open-vocabulary object detectors. Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks. The code is available at https://github.com/wusize/CLIM.

Code Repositories

wusize/clim
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
open-vocabulary-object-detection-on-lvis-v1-0CLIM (RN50x64)
AP novel-LVIS base training: 32.3
open-vocabulary-object-detection-on-mscocoCLIM (RN50)
AP 0.5: 36.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp