HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Image Segmentation Using Text and Image Prompts

Lüddecke Timo ; Ecker Alexander S.

Image Segmentation Using Text and Image Prompts

Abstract

Image segmentation is usually addressed by training a model for a fixed setof object classes. Incorporating additional classes or more complex querieslater is expensive as it requires re-training the model on a dataset thatencompasses these expressions. Here we propose a system that can generate imagesegmentations based on arbitrary prompts at test time. A prompt can be either atext or an image. This approach enables us to create a unified model (trainedonce) for three common segmentation tasks, which come with distinct challenges:referring expression segmentation, zero-shot segmentation and one-shotsegmentation. We build upon the CLIP model as a backbone which we extend with atransformer-based decoder that enables dense prediction. After training on anextended version of the PhraseCut dataset, our system generates a binarysegmentation map for an image based on a free-text prompt or on an additionalimage expressing the query. We analyze different variants of the latterimage-based prompts in detail. This novel hybrid input allows for dynamicadaptation not only to the three segmentation tasks mentioned above, but to anybinary segmentation task where a text or image query can be formulated.Finally, we find our system to adapt well to generalized queries involvingaffordances or properties. Code is available athttps://eckerlab.org/code/clipseg.

Code Repositories

openrobotlab/ov_parts
jax
Mentioned in GitHub
casia-iva-lab/fastsam
pytorch
Mentioned in GitHub
timojl/clipseg
Official
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
referring-image-matting-expression-based-onCLIPSeg (ViT-B/16)
MAD: 0.0394
MAD(E): 0.0419
MSE: 0.0358
MSE(E): 0.0381
SAD: 69.13
SAD(E): 73.53
referring-image-matting-keyword-based-onCLIPSeg (ViT-B/16)
MAD: 0.0101
MAD(E): 0.0106
MSE: 0.0064
MSE(E): 0.0067
SAD: 17.75
SAD(E): 18.69
referring-image-matting-refmatte-rw100-onCLIPSeg (ViT-B/16)
MAD: 0.1222
MAD(E): 0.1282
MSE: 0.1178
MSE(E): 0.1236
SAD: 211.86
SAD(E): 222.37

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp