HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for
  Generalist Robotic Policies

Abstract

Although large vision-language-action (VLA) models pretrained on extensiverobot datasets offer promising generalist policies for robotic learning, theystill struggle with spatial-temporal dynamics in interactive robotics, makingthem less effective in handling complex tasks, such as manipulation. In thiswork, we introduce visual trace prompting, a simple yet effective approach tofacilitate VLA models' spatial-temporal awareness for action prediction byencoding state-action trajectories visually. We develop a new TraceVLA model byfinetuning OpenVLA on our own collected dataset of 150K robot manipulationtrajectories using visual trace prompting. Evaluations of TraceVLA across 137configurations in SimplerEnv and 4 tasks on a physical WidowX robot demonstratestate-of-the-art performance, outperforming OpenVLA by 10% on SimplerEnv and3.5x on real-robot tasks and exhibiting robust generalization across diverseembodiments and scenarios. To further validate the effectiveness and generalityof our method, we present a compact VLA model based on 4B Phi-3-Vision,pretrained on the Open-X-Embodiment and finetuned on our dataset, rivals the 7BOpenVLA baseline while significantly improving inference efficiency.

Benchmarks

BenchmarkMethodologyMetrics
robot-manipulation-on-simpler-envTraceVLA
Variant Aggregation: 0.450
Variant Aggregation-Move Near: 0.564
Variant Aggregation-Open/Close Drawer: 0.310
Variant Aggregation-Pick Coke Can: 0.600
Visual Matching: 0.460
Visual Matching-Move Near: 0.600
Visual Matching-Open/Close Drawer: 0.240
Visual Matching-Pick Coke Can: 0.560

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Papers | HyperAI