3 months ago

YOLOv12: A Breakdown of the Key Architectural Features

Mujadded Al Rabbani Alif Muhammad Hussain

Abstract

This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.

Benchmarks

Benchmark	Methodology	Metrics
real-time-object-detection-on-coco	YOLOv12n	FPS (V100, b=1): 610 (T4) box AP: 40.6
real-time-object-detection-on-coco	YOLOv12m	FPS (V100, b=1): 206 (T4) box AP: 52.5
real-time-object-detection-on-coco	YOLOv12x	FPS (V100, b=1): 85 (T4) box AP: 55.2
real-time-object-detection-on-coco	YOLOv12l	FPS (V100, b=1): 148 (T4) box AP: 53.7
real-time-object-detection-on-coco	YOLOv12s	FPS (V100, b=1): 383 (T4) box AP: 48.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning