Command Palette
Search for a command to run...
Interpretable 3D Human Action Analysis with Temporal Convolutional Networks
Kim Tae Soo Reiter Austin

Abstract
The discriminative power of modern deep learning models for 3D human actionrecognition is growing ever so potent. In conjunction with the recentresurgence of 3D human action representation with 3D skeletons, the quality andthe pace of recent progress have been significant. However, the inner workingsof state-of-the-art learning based methods in 3D human action recognition stillremain mostly black-box. In this work, we propose to use a new class of modelsknown as Temporal Convolutional Neural Networks (TCN) for 3D human actionrecognition. Compared to popular LSTM-based Recurrent Neural Network models,given interpretable input such as 3D skeletons, TCN provides us a way toexplicitly learn readily interpretable spatio-temporal representations for 3Dhuman action recognition. We provide our strategy in re-designing the TCN withinterpretability in mind and how such characteristics of the model is leveragedto construct a powerful 3D activity recognition method. Through this work, wewish to take a step towards a spatio-temporal model that is easier tounderstand, explain and interpret. The resulting model, Res-TCN, achievesstate-of-the-art results on the largest 3D human action recognition dataset,NTU-RGBD.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multimodal-activity-recognition-on-ev-action | TCN (Skeleton Kinect) | Accuracy: 80.1 |
| multimodal-activity-recognition-on-ev-action | TCN (Skeleton Vicon) | Accuracy: 64.1 |
| skeleton-based-action-recognition-on-ntu-rgbd | TCN | Accuracy (CS): 74.3 Accuracy (CV): 83.1 |
| skeleton-based-action-recognition-on-varying | Res-TCN | Accuracy (AV I): 48% Accuracy (AV II): 68% Accuracy (CS): 63% Accuracy (CV I): 14% Accuracy (CV II): 48% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.