Command Palette
Search for a command to run...
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Boudjoghra Mohamed El Amine ; Dai Angela ; Lahoud Jean ; Cholakkal Hisham ; Anwer Rao Muhammad ; Khan Salman ; Khan Fahad Shahbaz

Abstract
Recent works on open-vocabulary 3D instance segmentation show strong promise,but at the cost of slow inference speed and high computation requirements. Thishigh computation cost is typically due to their heavy reliance on 3D clipfeatures, which require computationally expensive 2D foundation models likeSegment Anything (SAM) and CLIP for multi-view aggregation into 3D. As aconsequence, this hampers their applicability in many real-world applicationsthat require both fast and accurate predictions. To this end, we propose a fastyet accurate open-vocabulary 3D instance segmentation approach, named Open-YOLO3D, that effectively leverages only 2D object detection from multi-view RGBimages for open-vocabulary 3D instance segmentation. We address this task bygenerating class-agnostic 3D masks for objects in the scene and associatingthem with text prompts. We observe that the projection of class-agnostic 3Dpoint cloud instances already holds instance information; thus, using SAM mightonly result in redundancy that unnecessarily increases the inference time. Weempirically find that a better performance of matching text prompts to 3D maskscan be achieved in a faster fashion with a 2D object detector. We validate ourOpen-YOLO 3D on two benchmarks, ScanNet200 and Replica, under two scenarios:(i) with ground truth masks, where labels are required for given objectproposals, and (ii) with class-agnostic 3D proposals generated from a 3Dproposal network. Our Open-YOLO 3D achieves state-of-the-art performance onboth datasets while obtaining up to $\sim$16$\times$ speedup compared to thebest existing method in literature. On ScanNet200 val. set, our Open-YOLO 3Dachieves mean average precision (mAP) of 24.7\% while operating at 22 secondsper scene. Code and model are available at github.com/aminebdj/OpenYOLO3D.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-open-vocabulary-instance-segmentation-on | Open-YOLO 3D | AP Common: 24.3 AP Head: 27.8 AP Tail: 21.6 AP25: 36.2 AP50: 31.7 mAP: 24.7 |
| 3d-open-vocabulary-instance-segmentation-on-1 | Open-YOLO 3D | mAP: 23.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.