Command Palette
Search for a command to run...
Mujadded Al Rabbani Alif Muhammad Hussain

Abstract
This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| real-time-object-detection-on-coco | YOLOv12n | FPS (V100, b=1): 610 (T4) box AP: 40.6 |
| real-time-object-detection-on-coco | YOLOv12m | FPS (V100, b=1): 206 (T4) box AP: 52.5 |
| real-time-object-detection-on-coco | YOLOv12x | FPS (V100, b=1): 85 (T4) box AP: 55.2 |
| real-time-object-detection-on-coco | YOLOv12l | FPS (V100, b=1): 148 (T4) box AP: 53.7 |
| real-time-object-detection-on-coco | YOLOv12s | FPS (V100, b=1): 383 (T4) box AP: 48.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.