HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval

Dhiman Paul Md Rizwan Parvez Nabeel Mohammed Shafin Rahman

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for
  Joint Video Highlight Detection and Moment Retrieval

Abstract

Video Highlight Detection and Moment Retrieval (HD/MR) are essential in videoanalysis. Recent joint prediction transformer models often overlook theircross-task dynamics and video-text alignment and refinement. Moreover, mostmodels typically use limited, uni-directional attention mechanisms, resultingin weakly integrated representations and suboptimal performance in capturingthe interdependence between video and text modalities. Although large-languageand vision-language models (LLM/LVLMs) have gained prominence across variousdomains, their application in this field remains relatively underexplored. Herewe propose VideoLights, a novel HD/MR framework addressing these limitationsthrough (i) Convolutional Projection and Feature Refinement modules with analignment loss for better video-text feature alignment, (ii) Bi-DirectionalCross-Modal Fusion network for strongly coupled query-aware cliprepresentations, and (iii) Uni-directional joint-task feedback mechanismenhancing both tasks through correlation. In addition, (iv) we introduce hardpositive/negative losses for adaptive error penalization and improved learning,and (v) leverage LVLMs like BLIP-2 for enhanced multimodal feature integrationand intelligent pretraining using synthetic data generated from LVLMs.Comprehensive experiments on QVHighlights, TVSum, and Charades-STA benchmarksdemonstrate state-of-the-art performance. Codes and models are available athttps://github.com/dpaul06/VideoLights .

Code Repositories

dpaul06/VideoLights
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
highlight-detection-on-qvhighlightsVideoLights-B-pt
Hit@1: 70.56
mAP: 42.84
moment-retrieval-on-charades-staVideoLights-B-pt
R@1 IoU=0.3: 73.33
R@1 IoU=0.5: 61.96
R@1 IoU=0.7: 41.05
mIoU: 52.94
moment-retrieval-on-qvhighlightsVideoLights-B-pt
R@1 IoU=0.5: 70.36
R@1 IoU=0.7: 55.25
mAP: 47.94
mAP@0.5: 69.53
mAP@0.75: 49.17

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp