Command Palette
Search for a command to run...
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
Danila Rukhovich Anna Vorontsova Anton Konushin

Abstract
In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection. The source code and the trained models are available at https://github.com/saic-vul/imvoxelnet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-object-detection-on-dair-v2x-i | ImVoxelNet | AP|R40(easy): 44.8 AP|R40(hard): 37.6 AP|R40(moderate): 37.6 |
| 3d-object-detection-on-scannetv2 | ImVoxelNet (RGB only) | mAP@0.25: 48.1 mAP@0.5: 22.7 |
| monocular-3d-object-detection-on-sun-rgb-d | ImVoxelNet | AP@0.15 (10 / NYU-37): 42.69 AP@0.15 (10 / PNet-30): 48.74 AP@0.15 (NYU-37): 21.08 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.