Command Palette
Search for a command to run...
Tony Z. Zhao Vikash Kumar Sergey Levine Chelsea Finn

Abstract
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| robot-manipulation-generalization-on-the | ACT | Average decrease average across all perturbations: -61.8 |
| robot-manipulation-on-mimicgen | ACT (Evaluated in EquiDiff) | Succ. Rate (12 tasks, 100 demo/task): 21.3 Succ. Rate (12 tasks, 1000 demo/task): 63.3 Succ. Rate (12 tasks, 200 demo/task): 38.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.