Command Palette
Search for a command to run...
Wang Ziyi ; Yu Xumin ; Rao Yongming ; Zhou Jie ; Lu Jiwen

Abstract
With the overwhelming trend of mask image modeling led by MAE, generativepre-training has shown a remarkable potential to boost the performance offundamental models in 2D vision. However, in 3D vision, the over-reliance onTransformer-based backbones and the unordered nature of point clouds haverestricted the further development of generative pre-training. In this paper,we propose a novel 3D-to-2D generative pre-training method that is adaptable toany point cloud model. We propose to generate view images from differentinstructed poses via the cross-attention mechanism as the pre-training scheme.Generating view images has more precise supervision than its point cloudcounterpart, thus assisting 3D backbones to have a finer comprehension of thegeometrical structure and stereoscopic relations of the point cloud.Experimental results have proved the superiority of our proposed 3D-to-2Dgenerative pre-training over previous pre-training methods. Our method is alsoeffective in boosting the performance of architecture-oriented approaches,achieving state-of-the-art performance when fine-tuning on ScanObjectNNclassification and ShapeNetPart segmentation tasks. Code is available athttps://github.com/wangzy22/TAP.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-part-segmentation-on-shapenet-part | PointMLP+TAP | Class Average IoU: 85.2 Instance Average IoU: 86.9 |
| 3d-point-cloud-classification-on-scanobjectnn | PointMLP+TAP | Overall Accuracy: 88.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.