3 months ago

DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image

Fuyang Huang Ailing Zeng Minhao Liu Qiuxia Lai Qiang Xu

Abstract

In this paper, we propose a two-stage fully 3D network, namely \textbf{DeepFuse}, to estimate human pose in 3D space by fusing body-worn Inertial Measurement Unit (IMU) data and multi-view images deeply. The first stage is designed for pure vision estimation. To preserve data primitiveness of multi-view inputs, the vision stage uses multi-channel volume as data representation and 3D soft-argmax as activation layer. The second one is the IMU refinement stage which introduces an IMU-bone layer to fuse the IMU and vision data earlier at data level. without requiring a given skeleton model a priori, we can achieve a mean joint error of $28.9$mm on TotalCapture dataset and $13.4$mm on Human3.6M dataset under protocol 1, improving the SOTA result by a large margin. Finally, we discuss the effectiveness of a fully 3D network for 3D pose estimation experimentally which may benefit future research.

Benchmarks

Benchmark	Methodology	Metrics
3d-human-pose-estimation-on-human36m	DeepFuse	Average MPJPE (mm): 37.5 Multi-View or Monocular: Multi-View Using 2D ground-truth joints: No
3d-human-pose-estimation-on-total-capture	DeepFuse-IMU	Average MPJPE (mm): 28.9
3d-human-pose-estimation-on-total-capture	DeepFuse-Vision Only	Average MPJPE (mm): 32.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning