8 months ago

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely challenging due to 1) the highly complex spatial-temporal information in videos; and 2) the lack of labeled data for training. Unlike the representation learning for static images, it is difficult to construct a suitable self-supervised task to well model both motion and appearance features. More recently, several attempts have been made to learn video representation through video playback speed prediction. However, it is non-trivial to obtain precise speed labels for the videos. More critically, the learnt models may tend to focus on motion pattern and thus may not learn appearance features well. In this paper, we observe that the relative playback speed is more consistent with motion pattern, and thus provide more effective and stable supervision for representation learning. Therefore, we propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels. In this way, we are able to well perceive speed and learn better motion features. Moreover, to ensure the learning of appearance features, we further propose an appearance-focused task, where we enforce the model to perceive the appearance difference between two video clips. We show that optimizing the two tasks jointly consistently improves the performance on two downstream tasks, namely action recognition and video retrieval. Remarkably, for action recognition on UCF101 dataset, we achieve 93.7% accuracy without the use of labeled data for pre-training, which outperforms the ImageNet supervised pre-trained model. Code and pre-trained models can be found at https://github.com/PeihaoChen/RSPNet.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen extsuperscript1*, †, Deng Huang extsuperscript1†, Dongliang He extsuperscript2, Xiang Long extsuperscript2, Runhao Zeng extsuperscript1, Shilei Wen extsuperscript2, Mingkui Tan extsuperscript1‡, Chuang Gan extsuperscript3

Abstract

Build AI with AI

HyperAI Newsletters