Command Palette
Search for a command to run...
DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image
Fuyang Huang Ailing Zeng Minhao Liu Qiuxia Lai Qiang Xu

Abstract
In this paper, we propose a two-stage fully 3D network, namely \textbf{DeepFuse}, to estimate human pose in 3D space by fusing body-worn Inertial Measurement Unit (IMU) data and multi-view images deeply. The first stage is designed for pure vision estimation. To preserve data primitiveness of multi-view inputs, the vision stage uses multi-channel volume as data representation and 3D soft-argmax as activation layer. The second one is the IMU refinement stage which introduces an IMU-bone layer to fuse the IMU and vision data earlier at data level. without requiring a given skeleton model a priori, we can achieve a mean joint error of $28.9$mm on TotalCapture dataset and $13.4$mm on Human3.6M dataset under protocol 1, improving the SOTA result by a large margin. Finally, we discuss the effectiveness of a fully 3D network for 3D pose estimation experimentally which may benefit future research.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-human36m | DeepFuse | Average MPJPE (mm): 37.5 Multi-View or Monocular: Multi-View Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-total-capture | DeepFuse-IMU | Average MPJPE (mm): 28.9 |
| 3d-human-pose-estimation-on-total-capture | DeepFuse-Vision Only | Average MPJPE (mm): 32.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.