Command Palette
Search for a command to run...
Albert Haque; Boya Peng; Zelun Luo; Alexandre Alahi; Serena Yeung; Li Fei-Fei

Abstract
We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our approach leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. We evaluate our model on a previously published depth dataset and a newly collected human pose dataset containing 100K annotated depth images from extreme viewpoints. Experiments show that our model achieves competitive performance on frontal views while achieving state-of-the-art performance on alternate viewpoints.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| pose-estimation-on-itop-front-view | Multi-task learning + viewpoint invariance | Mean mAP: 77.4 |
| pose-estimation-on-itop-top-view | Multi-task learning + viewpoint invariance | Mean mAP: 75.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.