Command Palette
Search for a command to run...
DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders
Garau Nicola ; Bisagno Niccolò ; Bródka Piotr ; Conci Nicola

Abstract
Human Pose Estimation (HPE) aims at retrieving the 3D position of humanjoints from images or videos. We show that current 3D HPE methods suffer a lackof viewpoint equivariance, namely they tend to fail or perform poorly whendealing with viewpoints unseen at training time. Deep learning methods oftenrely on either scale-invariant, translation-invariant, or rotation-invariantoperations, such as max-pooling. However, the adoption of such procedures doesnot necessarily improve viewpoint generalization, rather leading to moredata-dependent methods. To tackle this issue, we propose a novel capsuleautoencoder network with fast Variational Bayes capsule routing, named DECA. Bymodeling each joint as a capsule entity, combined with the routing algorithm,our approach can preserve the joints' hierarchical and geometrical structure inthe feature space, independently from the viewpoint. By achieving viewpointequivariance, we drastically reduce the network data dependency at trainingtime, resulting in an improved ability to generalize for unseen viewpoints. Inthe experimental validation, we outperform other methods on depth images fromboth seen and unseen viewpoints, both top-view, and front-view. In the RGBdomain, the same network gives state-of-the-art results on the challengingviewpoint transfer task, also establishing a new framework for top-view HPE.The code can be found at https://github.com/mmlab-cv/DECA.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| pose-estimation-on-itop-front-view | DECA-D3 | Mean mAP: 88.75 |
| pose-estimation-on-itop-top-view | DECA-D3 | Mean mAP: 86.92 |
| pose-estimation-on-itop-top-view | DECA-D3 | Mean mAP: 86.92 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.