Command Palette
Search for a command to run...
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation
Chao Li; Qiaoyong Zhong; Di Xie; Shiliang Pu

Abstract
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| rf-based-pose-estimation-on-rf-mmd | HCN | mAP (@0.1, Through-wall): 78.5 mAP (@0.1, Visible): 82,5 |
| skeleton-based-action-recognition-on-ntu-rgbd | HCN | Accuracy (CS): 86.5 Accuracy (CV): 91.1 |
| skeleton-based-action-recognition-on-pku-mmd | HCN | mAP@0.50 (CS): 92.6 mAP@0.50 (CV): 94.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.