Command Palette
Search for a command to run...
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Liu Ye ; He Jixuan ; Li Wanhua ; Kim Junsik ; Wei Donglai ; Pfister Hanspeter ; Chen Chang Wen

Abstract
Video temporal grounding (VTG) is a fine-grained video understanding problemthat aims to ground relevant clips in untrimmed videos given natural languagequeries. Most existing VTG models are built upon frame-wise final-layer CLIPfeatures, aided by additional temporal backbones (e.g., SlowFast) withsophisticated temporal reasoning mechanisms. In this work, we claim that CLIPitself already shows great potential for fine-grained spatial-temporalmodeling, as each layer offers distinct yet useful information under differentgranularity levels. Motivated by this, we propose Reversed Recurrent Tuning($R^2$-Tuning), a parameter- and memory-efficient transfer learning frameworkfor video temporal grounding. Our method learns a lightweight $R^2$ Blockcontaining only 1.5% of the total parameters to perform progressivespatial-temporal modeling. Starting from the last layer of CLIP, $R^2$ Blockrecurrently aggregates spatial features from earlier layers, then refinestemporal correlation conditioning on the given query, resulting in acoarse-to-fine scheme. $R^2$-Tuning achieves state-of-the-art performanceacross three VTG tasks (i.e., moment retrieval, highlight detection, and videosummarization) on six public benchmarks (i.e., QVHighlights, Charades-STA,Ego4D-NLQ, TACoS, YouTube Highlights, and TVSum) even without the additionalbackbone, demonstrating the significance and effectiveness of the proposedscheme. Our code is available at https://github.com/yeliudev/R2-Tuning.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| highlight-detection-on-qvhighlights | R^2-Tuning | Hit@1: 64.20 mAP: 40.75 |
| moment-retrieval-on-qvhighlights | R^2-Tuning | R@1 IoU=0.5: 68.03 R@1 IoU=0.7: 49.35 mAP: 46.17 mAP@0.5: 69.04 mAP@0.75: 47.56 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.