Command Palette
Search for a command to run...
Takmaz Ayça ; Fedele Elisabetta ; Sumner Robert W. ; Pollefeys Marc ; Tombari Federico ; Engelmann Francis

Abstract
We introduce the task of open-vocabulary 3D instance segmentation. Currentapproaches for 3D instance segmentation can typically only recognize objectcategories from a pre-defined closed set of classes that are annotated in thetraining datasets. This results in important limitations for real-worldapplications where one might need to perform tasks guided by novel,open-vocabulary queries related to a wide variety of objects. Recently,open-vocabulary 3D scene understanding methods have emerged to address thisproblem by learning queryable features for each point in the scene. While sucha representation can be directly employed to perform semantic segmentation,existing methods cannot separate multiple object instances. In this work, weaddress this limitation, and propose OpenMask3D, which is a zero-shot approachfor open-vocabulary 3D instance segmentation. Guided by predictedclass-agnostic 3D instance masks, our model aggregates per-mask features viamulti-view fusion of CLIP-based image embeddings. Experiments and ablationstudies on ScanNet200 and Replica show that OpenMask3D outperforms otheropen-vocabulary methods, especially on the long-tail distribution. Qualitativeexperiments further showcase OpenMask3D's ability to segment object propertiesbased on free-form queries describing geometry, affordances, and materials.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-open-vocabulary-instance-segmentation-on | OpenMask3D | AP Common: 14.1 AP Head: 17.1 AP Tail: 14.9 AP25: 23.1 AP50: 19.9 mAP: 15.4 |
| 3d-open-vocabulary-instance-segmentation-on-1 | OpenMask3D | mAP: 13.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.