5 months ago

OpenScene: 3D Scene Understanding with Open Vocabularies

Peng Songyou ; Genova Kyle ; Jiang Chiyu Max ; Tagliasacchi Andrea ; Pollefeys Marc ; Funkhouser Thomas

Abstract

Traditional 3D scene understanding approaches rely on labeled 3D datasets totrain a model for a single task with supervision. We propose OpenScene, analternative approach where a model predicts dense features for 3D scene pointsthat are co-embedded with text and image pixels in CLIP feature space. Thiszero-shot approach enables task-agnostic training and open-vocabulary queries.For example, to perform SOTA zero-shot 3D semantic segmentation it first infersCLIP features for every 3D point and later classifies them based onsimilarities to embeddings of arbitrary class labels. More interestingly, itenables a suite of open-vocabulary scene understanding applications that havenever been done before. For example, it allows a user to enter an arbitrarytext query and then see a heat map indicating which parts of a scene match. Ourapproach is effective at identifying objects, materials, affordances,activities, and room types in complex 3D scenes, all using a single modeltrained without any labeled 3D data.

Code Repositories

pengsongyou/openscene

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-open-vocabulary-instance-segmentation-on-1	OpenScene + Mask3D	mAP: 10.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette