HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

{Zhaoxiang Zhang Wei Sui Qian Zhang Junran Peng Yonghao He Cong Pan}

BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

Abstract

Bird's Eye View (BEV) semantic segmentation is a critical task in autonomous driving. However, existing Transformer-based methods confront difficulties in transforming Perspective View (PV) to BEV due to their unidirectional and posterior interaction mechanisms. To address this issue, we propose a novel Bi-directional and Early Interaction Transformers framework named BAEFormer, consisting of (i) an early-interaction PV-BEV pipeline and (ii) a bi-directional cross-attention mechanism. Moreover, we find that the image feature maps' resolution in the cross-attention module has a limited effect on the final performance. Under this critical observation, we propose to enlarge the size of input images and downsample the multi-view image features for cross-interaction, further improving the accuracy while keeping the amount of computation controllable. Our proposed method for BEV semantic segmentation achieves state-of-the-art performance in real-time inference speed on the nuScenes dataset, i.e., 38.9 mIoU at 45 FPS on a single A100 GPU.

Benchmarks

BenchmarkMethodologyMetrics
bird-s-eye-view-semantic-segmentation-onBAEFormer
IoU veh - 224x480 - No vis filter - 100x100 at 0.5: 36
IoU veh - 224x480 - Vis filter. - 100x100 at 0.5: 38.9
IoU veh - 448x800 - No vis filter - 100x100 at 0.5: 37.8
IoU veh - 448x800 - Vis filter. - 100x100 at 0.5: 41.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation | Papers | HyperAI