Command Palette
Search for a command to run...
Kanazawa Angjoo ; Zhang Jason Y. ; Felsen Panna ; Malik Jitendra

Abstract
From an image of a person in action, we can easily guess the 3D motion of theperson in the immediate past and future. This is because we have a mental modelof 3D human dynamics that we have acquired from observing visual sequences ofhumans in motion. We present a framework that can similarly learn arepresentation of 3D dynamics of humans from video via a simple but effectivetemporal encoding of image features. At test time, from video, the learnedtemporal representation give rise to smooth 3D mesh predictions. From a singleimage, our model can recover the current 3D mesh as well as its 3D past andfuture motion. Our approach is designed so it can learn from videos with 2Dpose annotations in a semi-supervised manner. Though annotated data is alwayslimited, there are millions of videos uploaded daily on the Internet. In thiswork, we harvest this Internet-scale source of unlabeled data by training ourmodel on unlabeled video with pseudo-ground truth 2D pose obtained from anoff-the-shelf 2D pose detector. Our experiments show that adding more videoswith pseudo-ground truth 2D pose monotonically improves 3D predictionperformance. We evaluate our model, Human Mesh and Motion Recovery (HMMR), onthe recent challenging dataset of 3D Poses in the Wild and obtainstate-of-the-art performance on the 3D prediction task without any fine-tuning.The project website with video, code, and data can be found athttps://akanazawa.github.io/human_dynamics/.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-3dpw | HMMR (T=20) | Acceleration Error: 15.2 MPJPE: 116.5 PA-MPJPE: 72.6 |
| 3d-human-pose-estimation-on-human36m | HMMR (T=20) | Average MPJPE (mm): 83.7 PA-MPJPE: 56.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.