Command Palette
Search for a command to run...
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
Wang Jian ; Cao Zhe ; Luvizon Diogo ; Liu Lingjie ; Sarkar Kripasindhu ; Tang Danhang ; Beeler Thabo ; Theobalt Christian

Abstract
In this work, we explore egocentric whole-body motion capture using a singlefisheye camera, which simultaneously estimates human body and hand motion. Thistask presents significant challenges due to three factors: the lack ofhigh-quality datasets, fisheye camera distortion, and human bodyself-occlusion. To address these challenges, we propose a novel approach thatleverages FisheyeViT to extract fisheye image features, which are subsequentlyconverted into pixel-aligned 3D heatmap representations for 3D human body poseprediction. For hand tracking, we incorporate dedicated hand detection and handpose estimation networks for regressing 3D hand poses. Finally, we develop adiffusion-based whole-body motion prior model to refine the estimatedwhole-body motion while accounting for joint uncertainties. To train thesenetworks, we collect a large synthetic dataset, EgoWholeBody, comprising840,000 high-quality egocentric images captured across a diverse range ofwhole-body motion sequences. Quantitative and qualitative evaluationsdemonstrate the effectiveness of our method in producing high-qualitywhole-body motion estimates from a single egocentric camera.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| egocentric-pose-estimation-on-globalegomocap | EgoWholeMocap-Temporal | Average MPJPE (mm): 65.83 PA-MPJPE: 53.47 |
| egocentric-pose-estimation-on-globalegomocap | EgoWholeMocap-Single Frame | Average MPJPE (mm): 68.59 PA-MPJPE: 55.92 |
| egocentric-pose-estimation-on-sceneego | EgoWholeMocap-Single Frame | Average MPJPE (mm): 64.19 PA-MPJPE: 50.06 |
| egocentric-pose-estimation-on-sceneego | EgoWholeMocap-Temporal | Average MPJPE (mm): 57.59 PA-MPJPE: 46.55 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.