Command Palette
Search for a command to run...
SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers
Davoodnia Vandad ; Ghorbani Saeed ; Messier Alexandre ; Etemad Ali

Abstract
We introduce SkelFormer, a novel markerless motion capture pipeline formulti-view human pose and shape estimation. Our method first uses off-the-shelf2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain3D joint positions. Next, we design a regression-based inverse-kinematicskeletal transformer that maps the joint positions to pose and shaperepresentations from heavily noisy observations. This module integrates priorknowledge about pose space and infers the full pose state at runtime.Separating the 3D keypoint detection and inverse-kinematic problems, along withthe expressive representations learned by our skeletal transformer, enhance thegeneralization of our method to unseen noisy data. We evaluate our method onthree public datasets in both in-distribution and out-of-distribution settingsusing three datasets, and observe strong performance with respect to priorworks. Moreover, ablation experiments demonstrate the impact of each of themodules of our architecture. Finally, we study the performance of our method indealing with noise and heavy occlusions and find considerable robustness withrespect to other solutions.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-human36m | SkelFormer (LT) | Average MPJPE (mm): 25.2 Multi-View or Monocular: Multi-View PA-MPJPE: 20.6 Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-human36m | SkelFormer (CPN) | Average MPJPE (mm): 33.5 Multi-View or Monocular: Multi-View PA-MPJPE: 27.8 Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-rich | SkelFormer (HRNet - eval only) | MPJPE: 44.2 MPVPE: 39.9 PA-MPJPE: 35.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.