HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition

{Guangcan Liu Zhengtao Yu Hu Han Yisheng Zhu}

Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition

Abstract

Visual tempo characterizes the dynamics and the temporal evolution, which helps describe actions. Recent approaches directly perform visual tempo prediction on skeleton sequences, which may suffer from insufficient feature representation issue. In this paper, we observe that relative visual tempo is more in line with human intuition, and thus providing more effective supervision signals. Based on this, we propose a novel Relative Visual Tempo Contrastive Learning framework for skeleton action Representation (RVTCLR). Specifically, we design a Relative Visual Tempo Learning (RVTL) task to explore the motion information in intra-video clips, and an Appearance-Consistency (AC) task to learn appearance information simultaneously, resulting in more representative spatiotemporal features. Furthermore, skeleton sequence data is much sparser than RGB data, making the network learn shortcuts, and overfit to low-level information such as skeleton scales. To learn high-order semantics, we further design a new Distribution-Consistency (DC) branch, containing three components: Skeleton-specific Data Augmentation (SDA), Fine-grained Skeleton Encoding Module (FSEM), and Distribution-aware Diversity (DD) Loss. We term our entire method (RVTCLR with DC) as RVTCLR+. Extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets demonstrate that our RVTCLR+ can achieve competitive results over the state-of-the-art methods. Code is available at https://github.com/Zhuysheng/RVTCLR.

Benchmarks

BenchmarkMethodologyMetrics
self-supervised-human-action-recognition-on3s-RVTCLR+
Classifier: FC
Encoder: ST-GCN
xset (%): 68.9
xsub (%): 68.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition | Papers | HyperAI