HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

Mengde Xu; Zheng Zhang; Fangyun Wei; Yutong Lin; Yue Cao; Han Hu; Xiang Bai

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

Abstract

Recently, open-vocabulary image classification by vision language pre-training has demonstrated incredible achievements, that the model can classify arbitrary categories without seeing additional annotated images of that category. However, it is still unclear how to make the open-vocabulary recognition work well on broader vision problems. This paper targets open-vocabulary semantic segmentation by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images. To remedy the discrepancy in processing granularity, we refuse the use of the prevalent one-stage FCN based framework, and advocate a two-stage semantic segmentation framework, with the first stage extracting generalizable mask proposals and the second stage leveraging an image based CLIP model to perform open-vocabulary classification on the masked image crops which are generated in the first stage. Our experimental results show that this two-stage framework can achieve superior performance than FCN when trained only on COCO Stuff dataset and evaluated on other datasets without fine-tuning. Moreover, this simple framework also surpasses previous state-of-the-arts of zero-shot semantic segmentation by a large margin: +29.5 hIoU on the Pascal VOC 2012 dataset, and +8.9 hIoU on the COCO Stuff dataset. With its simplicity and strong performance, we hope this framework to serve as a baseline to facilitate future research. The code are made publicly available at~\url{https://github.com/MendelXu/zsseg.baseline}.

Code Repositories

openrobotlab/ov_parts
jax
Mentioned in GitHub
mendelxu/zsseg.baseline
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
open-vocabulary-semantic-segmentation-onSimSeg
mIoU: 34.5
open-vocabulary-semantic-segmentation-on-1SimSeg
mIoU: 47.7
open-vocabulary-semantic-segmentation-on-2SimSeg
mIoU: 20.5
open-vocabulary-semantic-segmentation-on-3SimSeg
mIoU: 7
open-vocabulary-semantic-segmentation-on-5ZSSeg
hIoU: 77.5
open-vocabulary-semantic-segmentation-on-cocoZSSeg
HIoU: 37.8
zero-shot-semantic-segmentation-on-coco-stuffzsseg
Inductive Setting hIoU: 36.3
Transductive Setting hIoU: 41.5
zero-shot-semantic-segmentation-on-pascal-voczsseg
Inductive Setting hIoU: 77.5
Transductive Setting hIoU: 79.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp