HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Zhang Yunpeng ; Zhu Zheng ; Du Dalong

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy
  Prediction

Abstract

The vision-based perception for autonomous driving has undergone atransformation from the bird-eye-view (BEV) representations to the 3D semanticoccupancy. Compared with the BEV planes, the 3D semantic occupancy furtherprovides structural information along the vertical direction. This paperpresents OccFormer, a dual-path transformer network to effectively process the3D volume for semantic occupancy prediction. OccFormer achieves a long-range,dynamic, and efficient encoding of the camera-generated 3D voxel features. Itis obtained by decomposing the heavy 3D processing into the local and globaltransformer pathways along the horizontal plane. For the occupancy decoder, weadapt the vanilla Mask2Former for 3D semantic occupancy by proposingpreserve-pooling and class-guided sampling, which notably mitigate the sparsityand class imbalance. Experimental results demonstrate that OccFormersignificantly outperforms existing methods for semantic scene completion onSemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Code Repositories

zhangyp15/occformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-semantic-scene-completion-from-a-single-1OccFormer
mIoU: 12.32
3d-semantic-scene-completion-on-kitti-360OccFormer
mIoU: 13.81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp