8 months ago

Abstract

The goal of the Semantic Scene Completion (SSC) task is to simultaneouslypredict a completed 3D voxel representation of volumetric occupancy andsemantic labels of objects in the scene from a single-view observation. Sincethe computational cost generally increases explosively along with the growth ofvoxel resolution, most current state-of-the-arts have to tailor their frameworkinto a low-resolution representation with the sacrifice of detail prediction.Thus, voxel resolution becomes one of the crucial difficulties that lead to theperformance bottleneck. In this paper, we propose to devise a new geometry-based strategy to embeddepth information with low-resolution voxel representation, which could stillbe able to encode sufficient geometric information, e.g., room layout, object'ssizes and shapes, to infer the invisible areas of the scene with wellstructure-preserving details. To this end, we first propose a novel 3Dsketch-aware feature embedding to explicitly encode geometric informationeffectively and efficiently. With the 3D sketch in hand, we further devise asimple yet effective semantic scene completion framework that incorporates alight-weight 3D Sketch Hallucination module to guide the inference of occupancyand the semantic labels via a semi-supervised structure prior learningstrategy. We demonstrate that our proposed geometric embedding works betterthan the depth feature learning from habitual SSC frameworks. Our final modelsurpasses state-of-the-arts consistently on three public benchmarks, which onlyrequires 3D volumes of 60 x 36 x 60 resolution for both input and output. Thecode and the supplementary material will be available athttps://charlesCXK.github.io.

Source PDF View Code