HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

Yu Zhu ; Zhang Runmin ; Ying Jiacheng ; Yu Junchen ; Hu Xiaohai ; Luo Lun ; Cao Si-Yuan ; Shen Hui-Liang

Context and Geometry Aware Voxel Transformer for Semantic Scene
  Completion

Abstract

Vision-based Semantic Scene Completion (SSC) has gained much attention due toits widespread applications in various 3D perception tasks. Existingsparse-to-dense approaches typically employ shared context-independent queriesacross various input images, which fails to capture distinctions among them asthe focal regions of different inputs vary and may result in undirected featureaggregation of cross-attention. Additionally, the absence of depth informationmay lead to points projected onto the image plane sharing the same 2D positionor similar sampling points in the feature map, resulting in depth ambiguity. Inthis paper, we present a novel context and geometry aware voxel transformer. Itutilizes a context aware query generator to initialize context-dependentqueries tailored to individual input images, effectively capturing their uniquecharacteristics and aggregating information within the region of interest.Furthermore, it extend deformable cross-attention from 2D to 3D pixel space,enabling the differentiation of points with similar image coordinates based ontheir depth coordinates. Building upon this module, we introduce a neuralnetwork named CGFormer to achieve semantic scene completion. Simultaneously,CGFormer leverages multiple 3D representations (i.e., voxel and TPV) to boostthe semantic and geometric representation abilities of the transformed 3Dvolume from both local and global perspectives. Experimental resultsdemonstrate that CGFormer achieves state-of-the-art performance on theSemanticKITTI and SSCBench-KITTI-360 benchmarks, attaining a mIoU of 16.87 and20.05, as well as an IoU of 45.99 and 48.07, respectively. Remarkably, CGFormereven outperforms approaches employing temporal images as inputs or much largerimage backbone networks.

Code Repositories

pkqbajng/cgformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-semantic-scene-completion-from-a-single-1CGFormer
mIoU: 16.63
3d-semantic-scene-completion-from-a-single-2CGFormer
mIoU: 20.05

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp