Command Palette
Search for a command to run...
Wang Rong ; Mao Wei ; Li Hongdong

Abstract
3D hand-object pose estimation is the key to the success of many computervision applications. The main focus of this task is to effectively model theinteraction between the hand and an object. To this end, existing works eitherrely on interaction constraints in a computationally-expensive iterativeoptimization, or consider only a sparse correlation between sampled hand andobject keypoints. In contrast, we propose a novel dense mutual attentionmechanism that is able to model fine-grained dependencies between the hand andthe object. Specifically, we first construct the hand and object graphsaccording to their mesh structures. For each hand node, we aggregate featuresfrom every object node by the learned attention and vice versa for each objectnode. Thanks to such dense mutual attention, our method is able to producephysically plausible poses with high quality and real-time inference speed.Extensive quantitative and qualitative experiments on large benchmark datasetsshow that our method outperforms state-of-the-art methods. The code isavailable at https://github.com/rongakowang/DenseMutualAttention.git.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-hand-pose-estimation-on-ho-3d | DMA | PA-MPJPE (mm): 10.1 |
| hand-object-pose-on-dexycb | DMA | ADD-S: 15.9 Average MPJPE (mm): 12.7 MCE: 32.6 OCE: 27.3 Procrustes-Aligned MPJPE: 6.86 |
| hand-object-pose-on-ho-3d | DMA | ADD-S: 20.8 Average MPJPE (mm): 22.2 OME: 45.5 PA-MPJPE: 10.1 ST-MPJPE: 23.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.