Command Palette
Search for a command to run...
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Abstract
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with2D image, language, audio, and video. Guided by ImageBind, we construct a jointembedding space between 3D and multi-modalities, enabling many promisingapplications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3Dopen-world understanding. On top of this, we further present Point-LLM, thefirst 3D large language model (LLM) following 3D multi-modal instructions. Byparameter-efficient fine-tuning techniques, Point-LLM injects the semantics ofPoint-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instructiondata, but exhibits superior 3D and multi-modal question-answering capacity. Wehope our work may cast a light on the community for extending 3D point cloudsto multi-modality applications. Code is available athttps://github.com/ZiyuGuo99/Point-Bind_Point-LLM.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-question-answering-3d-qa-on-3d-mm-vet | Point-Bind & Point-LLM | Overall Accuracy: 23.5 |
| generative-3d-object-classification-on-1 | Point-Bind LLM | Objaverse (Average): 5.25 Objaverse (C): 4.50 Objaverse (I): 6.00 |
| generative-3d-object-classification-on-2 | Point-Bind LLM | ModelNet40 (Average): 45.81 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.