Command Palette
Search for a command to run...
GenPose: Generative Category-level Object Pose Estimation via Diffusion Models
Zhang Jiyao ; Wu Mingdong ; Dong Hao

Abstract
Object pose estimation plays a vital role in embodied AI and computer vision,enabling intelligent agents to comprehend and interact with their surroundings.Despite the practicality of category-level pose estimation, current approachesencounter challenges with partially observed point clouds, known as themultihypothesis issue. In this study, we propose a novel solution by reframingcategorylevel object pose estimation as conditional generative modeling,departing from traditional point-to-point regression. Leveraging score-baseddiffusion models, we estimate object poses by sampling candidates from thediffusion model and aggregating them through a two-step process: filtering outoutliers via likelihood estimation and subsequently mean-pooling the remainingcandidates. To avoid the costly integration process when estimating thelikelihood, we introduce an alternative method that trains an energy-basedmodel from the original score-based model, enabling end-to-end likelihoodestimation. Our approach achieves state-of-the-art performance on the REAL275dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics,respectively. Furthermore, our method demonstrates strong generalizability tonovel categories sharing similar symmetric properties without fine-tuning andcan readily adapt to object pose tracking tasks, yielding comparable results tothe current state-of-the-art baselines.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 6d-pose-estimation-using-rgbd-on-real275 | GenPose https://github.com/Jiyao06/GenPose | mAP 10, 2cm: 72.4 mAP 10, 5cm: 84.0 mAP 5, 2cm: 52.1 mAP 5, 5cm: 60.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.