HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

Wanrong Zhu Xin Eric Wang Tsu-Jui Fu An Yan Pradyumna Narayana Kazoo Sone Sugato Basu William Yang Wang

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

Abstract

One of the most challenging topics in Natural Language Processing (NLP) is visually-grounded language understanding and reasoning. Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment. Due to the lack of human-annotated instructions that illustrate intricate urban scenes, outdoor VLN remains a challenging task to solve. This paper introduces a Multimodal Text Style Transfer (MTST) learning approach and leverages external multimodal resources to mitigate data scarcity in outdoor navigation tasks. We first enrich the navigation data by transferring the style of the instructions generated by Google Maps API, then pre-train the navigator with the augmented external outdoor navigation dataset. Experimental results show that our MTST learning approach is model-agnostic, and our MTST approach significantly outperforms the baseline models on the outdoor VLN task, improving task completion rate by 8.7% relatively on the test set.

Code Repositories

VegB/VLN-Transformer
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
vision-and-language-navigation-on-touchdownVLN Transformer
Task Completion (TC): 14.9
vision-and-language-navigation-on-touchdownGated Attention (GA)
Task Completion (TC): 11.9
vision-and-language-navigation-on-touchdownRConcat
Task Completion (TC): 11.8
vision-and-language-navigation-on-touchdownVLN Transformer +M-50 +style
Task Completion (TC): 16.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation | Papers | HyperAI