4 months ago

High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation

Quan-Sheng Zeng; Yunheng Li; Daquan Zhou; Guanbin Li; Qibin Hou; Ming-Ming Cheng

Abstract

Open-vocabulary image segmentation has been advanced through the synergy between mask generators and vision-language models like Contrastive Language-Image Pre-training (CLIP). Previous approaches focus on generating masks while aligning mask features with text embeddings during training. In this paper, we observe that relying on generated low-quality masks can weaken the alignment of vision and language in regional representations. This motivates us to present a new fine-tuning framework, named MaskCLIP++, which uses ground-truth masks instead of generated masks to enhance the mask classification capability of CLIP. Due to the limited diversity of image segmentation datasets with mask annotations, we propose incorporating a consistency alignment principle during fine-tuning, which alleviates categorical bias toward the fine-tuning dataset. After low-cost fine-tuning, MaskCLIP++ significantly improves the mask classification performance on multi-domain datasets. Combining with the mask generator in previous state-of-the-art mask-based open vocabulary segmentation methods, we achieve performance improvements of +1.7, +2.3, +2.1, +3.1, and +0.3 mIoU on the A-847, PC-459, A-150, PC-59, and PAS-20 datasets, respectively. Code is avaliable at https://github.com/HVision-NKU/MaskCLIPpp .

Code Repositories

hvision-nku/maskclippp

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
open-vocabulary-semantic-segmentation-on-1	MaskCLIP++	mIoU: 62.5
open-vocabulary-semantic-segmentation-on-2	MaskCLIP++	mIoU: 38.2
open-vocabulary-semantic-segmentation-on-3	MaskCLIP++	mIoU: 16.8
open-vocabulary-semantic-segmentation-on-5	MaskCLIP++	mIoU: 96.8
open-vocabulary-semantic-segmentation-on-7	MaskCLIP++	mIoU: 23.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette