Command Palette
Search for a command to run...
Chenxin Xu Robby T. Tan Yuhong Tan Siheng Chen Xinchao Wang Yanfeng Wang

Abstract
Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise and the goal is to recover corrupted coordinates depending on the rest coordinates. To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted transformer is promoted to capture more comprehensive spatial-temporal dependencies among body joints' coordinates, leading to better feature learning. Extensive experimental results have shown that our method outperforms state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and 3DPW datasets, respectively. We also demonstrate that our method is more robust under data missing cases and noisy data cases. Code is available at https://github.com/MediaBrain-SJTU/AuxFormer.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-pose-forecasting-on-3dpw | AuxFormer | Average MPJPE (mm) 1000 msec: 107.45 |
| human-pose-forecasting-on-human36m | AuxFormer | Average MPJPE (mm) @ 1000 ms: 107 Average MPJPE (mm) @ 400ms: 54.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.