HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

PA3D: Pose-Action 3D Machine for Video Recognition

{ Yu Qiao Zhifeng Li Yali Wang An Yan}

PA3D: Pose-Action 3D Machine for Video Recognition

Abstract

Recent studies have witnessed the successes of using 3D CNNs for video action recognition. However, most 3D models are built upon RGB and optical flow streams, which may not fully exploit pose dynamics, i.e., an important cue of modeling human actions. To fill this gap, we propose a concise Pose-Action 3D Machine (PA3D), which can effectively encode multiple pose modalities within a unified 3D framework, and consequently learn spatio-temporal pose representations for action recognition. More specifically, we introduce a novel temporal pose convolution to aggregate spatial poses over frames. Unlike the classical temporal convolution, our operation can explicitly learn the pose motions that are discriminative to recognize human actions. Extensive experiments on three popular benchmarks (i.e., JHMDB, HMDB, and Charades) show that, PA3D outperforms the recent pose-based approaches. Furthermore, PA3D is highly complementary to the recent 3D CNNs, e.g., I3D. Multi-stream fusion achieves the state-of-the-art performance on all evaluated data sets.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-charadesPA3D + (GCN + I3D + NL I3D)
MAP: 41
skeleton-based-action-recognition-on-j-hmdbPA3D
Accuracy (RGB+pose): 69.5
skeleton-based-action-recognition-on-j-hmdbPA3D+RPAN
Accuracy (RGB+pose): 86.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PA3D: Pose-Action 3D Machine for Video Recognition | Papers | HyperAI