HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Total capture: 3D human pose estimation fusing video and inertial sensors

{and John Collomosse Matthew Trumble Charles Malleson Adrian Hilton Andrew Gilbert}

Total capture: 3D human pose estimation fusing video and inertial sensors

Abstract

We present an algorithm for fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data to accurately estimate 3D human pose. A 3-D convolutional neural network is used to learn a pose embedding from volumetric probabilistic visual hull data (PVH) derived from the MVV frames. We incorporate this model within a dual stream network integrating pose embeddings derived from MVV and a forward kinematic solve of the IMU data. A temporal model (LSTM) is incorporated within both streams prior to their fusion. Hybrid pose inference using these two complementary data sources is shown to resolve ambiguities within each sensor modality, yielding improved accuracy over prior methods. A further contribution of this work is a new hybrid MVV dataset (TotalCapture) comprising video, IMU and a skeletal joint ground truth derived from a commercial motion capture system. The dataset is available online at http://cvssp.org/data/totalcapture/

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-human36mPVH-TSP
Average MPJPE (mm): 57.0
3d-human-pose-estimation-on-total-captureIMUPVH
Average MPJPE (mm): 70
3d-human-pose-estimation-on-total-capturePVH
Average MPJPE (mm): 107

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Total capture: 3D human pose estimation fusing video and inertial sensors | Papers | HyperAI