5 months ago

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

Wang Ziyi ; Yu Xumin ; Rao Yongming ; Zhou Jie ; Lu Jiwen

Abstract

With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available athttps://github.com/wangzy22/TAP.

Code Repositories

wangzy22/tap

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-part-segmentation-on-shapenet-part	PointMLP+TAP	Class Average IoU: 85.2 Instance Average IoU: 86.9
3d-point-cloud-classification-on-scanobjectnn	PointMLP+TAP	Overall Accuracy: 88.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette