3 months ago

Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score.

{Parvaneh Saeedi Mehryar Abbasi}

Abstract

In this paper, we present a new process for creating video summaries in an unsupervised manner. Our approach involves training a transformer encoder model to reconstruct missing frames in a video in a self-supervised way using the partially masked video as input. We then introduce an algorithm that utilizes the above-trained encoder to generate an importance score for each frame. Such frame importance scores are used to create the summary of the video. We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video. We validate the effectiveness of our approach on two benchmark datasets of TVSum and SumMe. We demonstrate that it outperforms state-of-the-art (SOTA) methods. Additionally, our approach is more stable during the training process compared to SOTA techniques based on generative adversarial learning. Our source code is publicly available 1 .

Benchmarks

Benchmark	Methodology	Metrics
unsupervised-video-summarization-on-summe	RS-SUM	F1-score: 52.0
unsupervised-video-summarization-on-tvsum	RS-SUM	F1-score: 61.4 Kendall's Tau: 0.08 Spearman's Rho: 0.106

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning