Command Palette
Search for a command to run...
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
Haodong Duan Jiaqi Wang Kai Chen Dahua Lin

Abstract
Graph convolution networks (GCN) have been widely used in skeleton-based action recognition. We note that existing GCN-based approaches primarily rely on prescribed graphical structures (ie., a manually defined topology of skeleton joints), which limits their flexibility to capture complicated correlations between joints. To move beyond this limitation, we propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. In particular, DG-GCN uses learned affinity matrices to capture dynamic graphical structures instead of relying on a prescribed one, while DG-TCN performs group-wise temporal convolutions with varying receptive fields and incorporates a dynamic joint-skeleton fusion module for adaptive multi-level temporal modeling. On a wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| skeleton-based-action-recognition-on-ntu-rgbd | DG-STGCN | Accuracy (CS): 93.2 Accuracy (CV): 97.5 Ensembled Modalities: 4 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | DG-STGCN | Accuracy (Cross-Setup): 91.3 Accuracy (Cross-Subject): 89.6 Ensembled Modalities: 4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.