Command Palette
Search for a command to run...
Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown
Zimeng Fang; Chao Liang; Xue Zhou; Shuyuan Zhu; Xi Li

Abstract
Multi-object tracking (MOT) emerges as a pivotal and highly promising branch in the field of computer vision. Classical closed-vocabulary MOT (CV-MOT) methods aim to track objects of predefined categories. Recently, some open-vocabulary MOT (OV-MOT) methods have successfully addressed the problem of tracking unknown categories. However, we found that the CV-MOT and OV-MOT methods each struggle to excel in the tasks of the other. In this paper, we present a unified framework, Associate Everything Detected (AED), that simultaneously tackles CV-MOT and OV-MOT by integrating with any off-the-shelf detector and supports unknown categories. Different from existing tracking-by-detection MOT methods, AED gets rid of prior knowledge (e.g. motion cues) and relies solely on highly robust feature learning to handle complex trajectories in OV-MOT tasks while keeping excellent performance in CV-MOT tasks. Specifically, we model the association task as a similarity decoding problem and propose a sim-decoder with an association-centric learning mechanism. The sim-decoder calculates similarities in three aspects: spatial, temporal, and cross-clip. Subsequently, association-centric learning leverages these threefold similarities to ensure that the extracted features are appropriate for continuous tracking and robust enough to generalize to unknown categories. Compared with existing powerful OV-MOT and CV-MOT methods, AED achieves superior performance on TAO, SportsMOT, and DanceTrack without any prior knowledge. Our code is available at https://github.com/balabooooo/AED.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-object-tracking-on-dancetrack | AED | AssA: 54.3 DetA: 82.0 HOTA: 66.6 IDF1: 69.7 MOTA: 92.2 |
| multi-object-tracking-on-sportsmot | AED | AssA: 70.1 DetA: 89.4 HOTA: 79.1 IDF1: 81.8 MOTA: 97.1 |
| multi-object-tracking-on-tao | AED (RegionCLIP) | AssocA: 38.1 ClsA: 16.2 LocA: 56.7 TETA: 37.0 |
| multi-object-tracking-on-tao | AED (Co-DETR) | AssocA: 52.4 ClsA: 41.7 LocA: 71.8 TETA: 55.3 |
| multiple-object-tracking-on-sportsmot | AED | AssA: 70.1 DetA: 89.4 HOTA: 79.1 IDF1: 81.8 MOTA: 97.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.