HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

Liu Ye ; He Jixuan ; Li Wanhua ; Kim Junsik ; Wei Donglai ; Pfister Hanspeter ; Chen Chang Wen

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video
  Temporal Grounding

Abstract

Video temporal grounding (VTG) is a fine-grained video understanding problemthat aims to ground relevant clips in untrimmed videos given natural languagequeries. Most existing VTG models are built upon frame-wise final-layer CLIPfeatures, aided by additional temporal backbones (e.g., SlowFast) withsophisticated temporal reasoning mechanisms. In this work, we claim that CLIPitself already shows great potential for fine-grained spatial-temporalmodeling, as each layer offers distinct yet useful information under differentgranularity levels. Motivated by this, we propose Reversed Recurrent Tuning($R^2$-Tuning), a parameter- and memory-efficient transfer learning frameworkfor video temporal grounding. Our method learns a lightweight $R^2$ Blockcontaining only 1.5% of the total parameters to perform progressivespatial-temporal modeling. Starting from the last layer of CLIP, $R^2$ Blockrecurrently aggregates spatial features from earlier layers, then refinestemporal correlation conditioning on the given query, resulting in acoarse-to-fine scheme. $R^2$-Tuning achieves state-of-the-art performanceacross three VTG tasks (i.e., moment retrieval, highlight detection, and videosummarization) on six public benchmarks (i.e., QVHighlights, Charades-STA,Ego4D-NLQ, TACoS, YouTube Highlights, and TVSum) even without the additionalbackbone, demonstrating the significance and effectiveness of the proposedscheme. Our code is available at https://github.com/yeliudev/R2-Tuning.

Code Repositories

yeliudev/R2-Tuning
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
highlight-detection-on-qvhighlightsR^2-Tuning
Hit@1: 64.20
mAP: 40.75
moment-retrieval-on-qvhighlightsR^2-Tuning
R@1 IoU=0.5: 68.03
R@1 IoU=0.7: 49.35
mAP: 46.17
mAP@0.5: 69.04
mAP@0.75: 47.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp