8 months ago

Computer Vision

Semantic Segmentation

Convolutional Neural Network

Method/Architecture

Computer Vision

Chenyang Lu Marinus Jacobus Gerardus van de Molengraft Gijs Dubbelman

Abstract

In this work, we research and evaluate end-to-end learning of monocularsemantic-metric occupancy grid mapping from weak binocular ground truth. Thenetwork learns to predict four classes, as well as a camera to bird's eye viewmapping. At the core, it utilizes a variational encoder-decoder network thatencodes the front-view visual information of the driving scene and subsequentlydecodes it into a 2-D top-view Cartesian coordinate system. The evaluations onCityscapes show that the end-to-end learning of semantic-metric occupancy gridsoutperforms the deterministic mapping approach with flat-plane assumption bymore than 12% mean IoU. Furthermore, we show that the variational sampling witha relatively small embedding vector brings robustness against vehicle dynamicperturbations, and generalizability for unseen KITTI data. Our network achievesreal-time inference rates of approx. 35 Hz for an input image with a resolutionof 256x512 pixels and an output map with 64x64 occupancy grid cells using aTitan V GPU.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Semantic Segmentation

Convolutional Neural Network

Method/Architecture

Computer Vision

Chenyang Lu Marinus Jacobus Gerardus van de Molengraft Gijs Dubbelman

Abstract

In this work, we research and evaluate end-to-end learning of monocularsemantic-metric occupancy grid mapping from weak binocular ground truth. Thenetwork learns to predict four classes, as well as a camera to bird's eye viewmapping. At the core, it utilizes a variational encoder-decoder network thatencodes the front-view visual information of the driving scene and subsequentlydecodes it into a 2-D top-view Cartesian coordinate system. The evaluations onCityscapes show that the end-to-end learning of semantic-metric occupancy gridsoutperforms the deterministic mapping approach with flat-plane assumption bymore than 12% mean IoU. Furthermore, we show that the variational sampling witha relatively small embedding vector brings robustness against vehicle dynamicperturbations, and generalizability for unseen KITTI data. Our network achievesreal-time inference rates of approx. 35 Hz for an input image with a resolutionof 256x512 pixels and an output map with 64x64 occupancy grid cells using aTitan V GPU.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp