3 months ago

Digging Into Self-Supervised Monocular Depth Estimation

{ Gabriel J. Brostow Michael Firman Oisin Mac Aodha Clement Godard}

Abstract

Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.

Benchmarks

Benchmark	Methodology	Metrics
monocular-depth-estimation-on-kitti-eigen-1	Monodepth2 M	absolute relative error: 0.115
monocular-depth-estimation-on-kitti-eigen-1	Monodepth2 S	absolute relative error: 0.109
monocular-depth-estimation-on-kitti-eigen-1	Monodepth2 MS	absolute relative error: 0.106

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning