HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation

{Ilya Makarov Aleksei Karpov}

Abstract

Depth estimation is a crucial task for the creation of depth maps, one of the most important components for augmented reality (AR) and other applications. However, the most widely used hardware for AR and smartphones has only sparse depth sensors with different ground truth depth acquisition methods. Thus, depth estimation models that are robust for downstream AR tasks performance can only be trained reliably using self-supervised learning based on camera information. Previous works in the field mostly focus on self-supervised models with pure convolutional architectures, without taking global spatial context into account.In this paper, we utilize vision transformer architectures for self-supervised monocular depth estimation and propose VTDepth, a vision transformer-based model, which provides a solution to the problem of the global spatial context. We compare various combinations of convolutional and transformer architectures for self-supervised depth estimation and show that the best combination of models is an encoder with a transformer basis and convolutional decoder. Our experiments demonstrate the efficiency of VTDepth for self-supervised depth estimation. Our set of models achieves state-of-the-art performance for self-supervised learning on NYUv2 and KITTI datasets. Our code is available at https://github.com/ahbpp/VTDepth.

Benchmarks

BenchmarkMethodologyMetrics
monocular-depth-estimation-on-kitti-eigen-1VTDepthB2 (stereo supervision)
Delta u003c 1.25: 0.904
Delta u003c 1.25^2: 0.965
Delta u003c 1.25^3: 0.983
RMSE: 4.439
RMSE log: 0.178
Sq Rel: 0.743
absolute relative error: 0.099
monocular-depth-estimation-on-kitti-eigen-1VTDepthB2 (monocular supervision)
Delta u003c 1.25: 0.893
Delta u003c 1.25^2: 0.964
Delta u003c 1.25^3: 0.983
RMSE: 4.530
RMSE log: 0.182
Sq Rel: 0.762
absolute relative error: 0.105

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp