HyperAI
HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Moment Retrieval
Moment Retrieval On Qvhighlights
Moment Retrieval On Qvhighlights
Metrics
R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75
Results
Performance results of various models on this benchmark
Columns
Model Name
R@1 IoU=0.5
R@1 IoU=0.7
mAP
mAP@0.5
mAP@0.75
Paper Title
Repository
SG-DETR
72.20
56.60
54.10
73.20
55.80
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
-
LLMEPET
66.73
49.94
44.05
65.76
43.91
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
-
DenoiseLoc
59.27
45.07
-
-
-
Boundary-Denoising for Video Activity Localization
-
BAM-DETR
62.71
48.64
45.36
64.57
46.33
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
-
UMT
-
-
36.12
-
-
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
-
VideoLights-B-pt
70.36
55.25
47.94
69.53
49.17
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
-
UniVTG (w/ PT)
65.43
50.06
43.63
64.06
45.02
UniVTG: Towards Unified Video-Language Temporal Grounding
-
UVCOM (w/ PT ASR Captions)
64.53
48.31
43.8
64.78
43.65
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
-
QD-DETR (only Video)
62.40
44.98
39.86
62.52
39.88
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
-
R^2-Tuning
68.03
49.35
46.17
69.04
47.56
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
-
FlashVTG
70.69
53.96
52.00
72.33
53.85
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
-
SeViLA-Localizer
54.5
36.5
32.3
-
-
-
-
QD-DETR (w/ audio)
63.06
45.10
40.19
63.04
40.10
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
-
BAM-DETR (w/ audio)
64.07
48.12
46.91
65.61
47.51
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
-
CG-DETR
65.43
48.38
42.86
64.51
42.77
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
-
UnLoc-L
66.1
46.7
-
-
-
UnLoc: A Unified Framework for Video Localization Tasks
-
LD-DETR
66.80
51.04
46.41
67.61
46.99
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
-
LA-DETR
63.94
51.10
47.93
65.65
49.44
Length-Aware DETR for Robust Moment Retrieval
-
LLaVA-MR
76.59
61.48
52.73
69.41
54.40
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
-
BM-DETR
60.12
43.05
40.08
63.08
40.18
Background-aware Moment Detection for Video Moment Retrieval
-
0 of 32 row(s) selected.
Previous
Next
Moment Retrieval On Qvhighlights | SOTA | HyperAI