8 months ago

Computer Vision

Multi-Task Learning

Multimodal Representation

Method/Architecture

Computer Vision

Vandad Davoodnia Saeed Ghorbani Alexandre Messier Ali Etemad

Abstract

We introduce SkelFormer, a novel markerless motion capture pipeline formulti-view human pose and shape estimation. Our method first uses off-the-shelf2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain3D joint positions. Next, we design a regression-based inverse-kinematicskeletal transformer that maps the joint positions to pose and shaperepresentations from heavily noisy observations. This module integrates priorknowledge about pose space and infers the full pose state at runtime.Separating the 3D keypoint detection and inverse-kinematic problems, along withthe expressive representations learned by our skeletal transformer, enhance thegeneralization of our method to unseen noisy data. We evaluate our method onthree public datasets in both in-distribution and out-of-distribution settingsusing three datasets, and observe strong performance with respect to priorworks. Moreover, ablation experiments demonstrate the impact of each of themodules of our architecture. Finally, we study the performance of our method indealing with noise and heavy occlusions and find considerable robustness withrespect to other solutions.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Multi-Task Learning

Multimodal Representation

Method/Architecture

Computer Vision

Vandad Davoodnia Saeed Ghorbani Alexandre Messier Ali Etemad

Abstract

We introduce SkelFormer, a novel markerless motion capture pipeline formulti-view human pose and shape estimation. Our method first uses off-the-shelf2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain3D joint positions. Next, we design a regression-based inverse-kinematicskeletal transformer that maps the joint positions to pose and shaperepresentations from heavily noisy observations. This module integrates priorknowledge about pose space and infers the full pose state at runtime.Separating the 3D keypoint detection and inverse-kinematic problems, along withthe expressive representations learned by our skeletal transformer, enhance thegeneralization of our method to unseen noisy data. We evaluate our method onthree public datasets in both in-distribution and out-of-distribution settingsusing three datasets, and observe strong performance with respect to priorworks. Moreover, ablation experiments demonstrate the impact of each of themodules of our architecture. Finally, we study the performance of our method indealing with noise and heavy occlusions and find considerable robustness withrespect to other solutions.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp