HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PointLLM: Empowering Large Language Models to Understand Point Clouds

Xu Runsen ; Wang Xiaolong ; Wang Tai ; Chen Yilun ; Pang Jiangmiao ; Lin Dahua

PointLLM: Empowering Large Language Models to Understand Point Clouds

Abstract

The unprecedented advancements in Large Language Models (LLMs) have shown aprofound impact on natural language processing but are yet to fully embrace therealm of 3D understanding. This paper introduces PointLLM, a preliminary effortto fill this gap, enabling LLMs to understand point clouds and offering a newavenue beyond 2D visual data. PointLLM understands colored object point cloudswith human instructions and generates contextually appropriate responses,illustrating its grasp of point clouds and common sense. Specifically, itleverages a point cloud encoder with a powerful LLM to effectively fusegeometric, appearance, and linguistic information. We collect a novel datasetcomprising 660K simple and 70K complex point-text instruction pairs to enable atwo-stage training strategy: aligning latent spaces and subsequentlyinstruction-tuning the unified model. To rigorously evaluate the perceptual andgeneralization capabilities of PointLLM, we establish two benchmarks:Generative 3D Object Classification and 3D Object Captioning, assessed throughthree different methods, including human evaluation, GPT-4/ChatGPT evaluation,and traditional metrics. Experimental results reveal PointLLM's superiorperformance over existing 2D and 3D baselines, with a notable achievement inhuman-evaluated object captioning tasks where it surpasses human annotators inover 50% of the samples. Codes, datasets, and benchmarks are available athttps://github.com/OpenRobotLab/PointLLM .

Code Repositories

openrobotlab/pointllm
Official
pytorch
Mentioned in GitHub
qizekun/ShapeLLM
pytorch
Mentioned in GitHub
Pointcept/GPT4Point
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-object-captioning-on-objaverse-1PointLLM-7B V1.2
Sentence-BERT: 47.47
Correctness: 3.04
GPT-4: 44.85
Hallucination: 0.66
Precision: 82.14
SimCSE: 48.55
3d-object-captioning-on-objaverse-1PointLLM-13B V1.2
Sentence-BERT: 47.91
Correctness: 3.10
GPT-4: 48.15
Hallucination: 0.84
Precision: 78.75
SimCSE: 49.12
3d-question-answering-3d-qa-on-3d-mm-vetPointLLM-13B v1.2
Overall Accuracy: 46.6
3d-question-answering-3d-qa-on-3d-mm-vetPointLLM-7B v1.2
Overall Accuracy: 41.2
generative-3d-object-classification-on-1PointLLM-13B v1.2
Objaverse (Average): 54.00
Objaverse (C): 51.50
Objaverse (I): 56.50
generative-3d-object-classification-on-1PointLLM-7B v1.2
Objaverse (Average): 53.00
Objaverse (C): 51.00
Objaverse (I): 55.00
generative-3d-object-classification-on-2PointLLM-13B v1.2
ModelNet40 (Average): 52.78
ModelNet40 (C): 52.55
ModelNet40 (I): 53.00
generative-3d-object-classification-on-2PointLLM-7B v1.2
ModelNet40 (Average): 52.63
ModelNet40 (C): 51.82
ModelNet40 (I): 53.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp