Command Palette
Search for a command to run...
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach
Ailing Zeng Xiao Sun Fuyang Huang Minhao Liu Qiang Xu Stephen Lin

Abstract
Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-human36m | SRNet (T=243) | Average MPJPE (mm): 44.8 Multi-View or Monocular: Monocular Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-human36m | SRNet (T=1) | Average MPJPE (mm): 49.9 Multi-View or Monocular: Monocular Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-human36m | SRNet (T=243 GT) | Average MPJPE (mm): 32 Multi-View or Monocular: Monocular Using 2D ground-truth joints: Yes |
| 3d-human-pose-estimation-on-human36m | SRNet (T=1 GT) | Average MPJPE (mm): 33.9 Multi-View or Monocular: Monocular Using 2D ground-truth joints: Yes |
| 3d-human-pose-estimation-on-mpi-inf-3dhp | SRNET | AUC: 43.8 PCK: 77.6 |
| monocular-3d-human-pose-estimation-on-human3 | SRNET | Average MPJPE (mm): 49.9 Frames Needed: 1 Need Ground Truth 2D Pose: No Use Video Sequence: No |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.