5 months ago

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

Shizhe Chen; Ricardo Garcia; Cordelia Schmid; Ivan Laptev

Abstract

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

Code Repositories

vlc-robot/polarnet

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
robot-manipulation-generalization-on-gembench	PolarNet	Average Success Rate: 38.4 Average Success Rate (L1): 77.7±0.9 Average Success Rate (L2): 37.1±1.4 Average Success Rate (L3): 38.5±1.7 Average Success Rate (L4): 0.1±0.2
robot-manipulation-on-rlbench	PolarNet	Input Image Size: 128 Succ. Rate (10 tasks, 100 demos/task): 89.8 Succ. Rate (18 tasks, 100 demo/task): 46.4 Succ. Rate (74 tasks, 100 demos/task): 60.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette