HyperAI超神经

Video Object Detection On Imagenet Vid

评估指标

MAP

评测结果

各个模型在此基准测试上的表现结果

模型名称
MAP
Paper TitleRepository
YOLOV87.5YOLOV: Making Still Image Object Detectors Great at Video Object Detection
SELSA (ResNet-101)82.69Sequence Level Semantics Aggregation for Video Object Detection
REPP + SELSA (ResNet-101)84.2Robust and Efficient Post-Processing for Video Object Detection (REPP)
BoxMask (ResNet-50)80.7BoxMask: Revisiting Bounding Box Supervision for Video Object Detection-
SELSA (ResNeXt-101)84.3Sequence Level Semantics Aggregation for Video Object Detection
YOLOV++93.2Practical Video Object Detection via Feature Selection and Aggregation
Ours (Faster RCNN + R101)87.2Objects do not disappear: Video object detection by single-frame object location anticipation
Ours (Def. DETR + SwinB)91.3Objects do not disappear: Video object detection by single-frame object location anticipation
DiffusionVID (ResNet-101)87.1DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
SparseVOD (ResNet-50)80.3Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection-
Online TSM76.3TSM: Temporal Shift Module for Efficient Video Understanding
DiffusionVID (Swin-B)92.5DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection
LSTS (ResNet-101)81.7Learning Where to Focus for Efficient Video Object Detection
REPP + YOLOv375.1Robust and Efficient Post-Processing for Video Object Detection (REPP)
Tracklet-Conditioned Detection+DCNv2+FGFA83.5Integrated Object Detection and Tracking with Tracklet-Conditioned Detection-
TransVOD (Swin Base)90.1TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
ClipVID85.8Identity-Consistent Aggregation for Video Object Detection
MEGA (ResNeXt101)85.4Memory Enhanced Global-Local Aggregation for Video Object Detection
PTSEFormer (ResNet-101)88.1PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Looking Fast and Slow63.9Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
0 of 33 row(s) selected.