Command Palette
Search for a command to run...
Zhu Jiagang ; Zou Wei ; Xu Liang ; Hu Yiming ; Zhu Zheng ; Chang Manyu ; Huang Junjie ; Huang Guan ; Du Dalong

Abstract
Existing methods in video action recognition mostly do not distinguish humanbody from the environment and easily overfit the scenes and objects. In thiswork, we present a conceptually simple, general and high-performance frameworkfor action recognition in trimmed videos, aiming at person-centric modeling.The method, called Action Machine, takes as inputs the videos cropped by personbounding boxes. It extends the Inflated 3D ConvNet (I3D) by adding a branch forhuman pose estimation and a 2D CNN for pose-based action recognition, beingfast to train and test. Action Machine can benefit from the multi-task trainingof action recognition and pose estimation, the fusion of predictions from RGBimages and poses. On NTU RGB-D, Action Machine achieves the state-of-the-artperformance with top-1 accuracies of 97.2% and 94.3% on cross-view andcross-subject respectively. Action Machine also achieves competitiveperformance on another three smaller action recognition datasets: NorthwesternUCLA Multiview Action3D, MSR Daily Activity3D and UTD-MHAD. Code will be madeavailable.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-recognition-in-videos-on-ntu-rgbd | Action Machine (RGB only) | Accuracy (CS): 94.3 Accuracy (CV): 97.2 |
| action-recognition-in-videos-on-utd-mhad | Action Machine (RGB only) | Accuracy: 92.5 |
| multimodal-activity-recognition-on-msr-daily-1 | Action Machine (RGB only) | Accuracy: 93.0 |
| multimodal-activity-recognition-on-utd-mhad | Action Machine | Accuracy (CS): 92.5 |
| skeleton-based-action-recognition-on-n-ucla | Action Machine | Accuracy: 92.3% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.