Command Palette
Search for a command to run...
Progressive Video Summarization via Multimodal Self-supervised Learning
Li Haopeng; Ke Qiuhong; Gong Mingming; Tom Drummond

Abstract
Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The multimodal framework is trained on a newly-collected dataset that consists of video-text pairs. Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. Extensive experiments have proved the effectiveness and superiority of our method in rank correlation coefficients and F-score.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| supervised-video-summarization-on-summe | SSPVS | F1-score (Augmented): 50.4 F1-score (Canonical): 48.7 Kendall's Tau: 0.178 Spearman's Rho: 0.240 |
| supervised-video-summarization-on-summe | SSPVS(+Text) | F1-score (Canonical): 50.7 Kendall's Tau: 0.192 Spearman's Rho: 0.257 |
| supervised-video-summarization-on-tvsum | SSPVS(+Text) | F1-score (Canonical): 60.4 Kendall's Tau: 0.181 Spearman's Rho: 0.238 |
| supervised-video-summarization-on-tvsum | SSPVS | F1-score (Augmented): 61.8 F1-score (Canonical): 60.3 Kendall's Tau: 0.177 Spearman's Rho: 0.233 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.