6 months ago

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan

Abstract

Spatial intelligence is a critical component of embodied AI, promoting robotsto understand and interact with their environments. While recent advances haveenhanced the ability of VLMs to perceive object locations and positionalrelationships, they still lack the capability to precisely understand objectorientations-a key requirement for tasks involving fine-grained manipulations.Addressing this limitation not only requires geometric reasoning but also anexpressive and intuitive way to represent orientation. In this context, wepropose that natural language offers a more flexible representation space thancanonical frames, making it particularly suitable for instruction-followingrobotic systems. In this paper, we introduce the concept of semanticorientation, which defines object orientations using natural language in areference-frame-free manner (e.g., the ''plug-in'' direction of a USB or the''handle'' direction of a knife). To support this, we construct OrienText300K,a large-scale dataset of 3D models annotated with semantic orientations thatlink geometric understanding to functional semantics. By integrating semanticorientation into a VLM system, we enable robots to generate manipulationactions with both positional and orientational constraints. Extensiveexperiments in simulation and real world demonstrate that our approachsignificantly enhances robotic manipulation capabilities, e.g., 48.7% accuracyon Open6DOR and 74.9% accuracy on SIMPLER.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Multimodal

Multimodal Representation

Natural Language Processing

Multimodality

Task/Problem

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Multimodal

Multimodal Representation

Natural Language Processing

Multimodality

Task/Problem

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan8 more

Abstract

Build AI with AI

HyperAI Newsletters

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan

Zekun Qi Wenyao Zhang Yufei Ding Runpei Dong Xinqiang Yu Jingwen Li Lingyun Xu Baoyu Li Xialin He Guofan Fan