Command Palette
Search for a command to run...
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul Md Rizwan Parvez Nabeel Mohammed Shafin Rahman

Abstract
Video Highlight Detection and Moment Retrieval (HD/MR) are essential in videoanalysis. Recent joint prediction transformer models often overlook theircross-task dynamics and video-text alignment and refinement. Moreover, mostmodels typically use limited, uni-directional attention mechanisms, resultingin weakly integrated representations and suboptimal performance in capturingthe interdependence between video and text modalities. Although large-languageand vision-language models (LLM/LVLMs) have gained prominence across variousdomains, their application in this field remains relatively underexplored. Herewe propose VideoLights, a novel HD/MR framework addressing these limitationsthrough (i) Convolutional Projection and Feature Refinement modules with analignment loss for better video-text feature alignment, (ii) Bi-DirectionalCross-Modal Fusion network for strongly coupled query-aware cliprepresentations, and (iii) Uni-directional joint-task feedback mechanismenhancing both tasks through correlation. In addition, (iv) we introduce hardpositive/negative losses for adaptive error penalization and improved learning,and (v) leverage LVLMs like BLIP-2 for enhanced multimodal feature integrationand intelligent pretraining using synthetic data generated from LVLMs.Comprehensive experiments on QVHighlights, TVSum, and Charades-STA benchmarksdemonstrate state-of-the-art performance. Codes and models are available athttps://github.com/dpaul06/VideoLights .
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| highlight-detection-on-qvhighlights | VideoLights-B-pt | Hit@1: 70.56 mAP: 42.84 |
| moment-retrieval-on-charades-sta | VideoLights-B-pt | R@1 IoU=0.3: 73.33 R@1 IoU=0.5: 61.96 R@1 IoU=0.7: 41.05 mIoU: 52.94 |
| moment-retrieval-on-qvhighlights | VideoLights-B-pt | R@1 IoU=0.5: 70.36 R@1 IoU=0.7: 55.25 mAP: 47.94 mAP@0.5: 69.53 mAP@0.75: 49.17 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.