HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Cross-view Transformers for real-time Map-view Semantic Segmentation

Zhou Brady ; Krähenbühl Philipp

Cross-view Transformers for real-time Map-view Semantic Segmentation

Abstract

We present cross-view transformers, an efficient attention-based model formap-view semantic segmentation from multiple cameras. Our architectureimplicitly learns a mapping from individual camera views into a canonicalmap-view representation using a camera-aware cross-view attention mechanism.Each camera uses positional embeddings that depend on its intrinsic andextrinsic calibration. These embeddings allow a transformer to learn themapping across different views without ever explicitly modeling itgeometrically. The architecture consists of a convolutional image encoder foreach view and cross-view transformer layers to infer a map-view semanticsegmentation. Our model is simple, easily parallelizable, and runs inreal-time. The presented architecture performs at state-of-the-art on thenuScenes dataset, with 4x faster inference speeds. Code is available athttps://github.com/bradyz/cross_view_transformers.

Code Repositories

valeoai/pointbev
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
bird-s-eye-view-semantic-segmentation-onCVT
IoU veh - 224x480 - No vis filter - 100x100 at 0.5: 31.4
IoU veh - 224x480 - Vis filter. - 100x100 at 0.5: 36.0
IoU veh - 448x800 - No vis filter - 100x100 at 0.5: 32.5
IoU veh - 448x800 - Vis filter. - 100x100 at 0.5: 37.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp