HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Mingyang Zhou; Runxiang Cheng; Yong Jae Lee; Zhou Yu

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Abstract

We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. Our model jointly optimizes the learning of a shared visual-language embedding and a translator. The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics. Our approach achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. On this dataset, our visual attention grounding model outperforms other methods by a large margin.

Code Repositories

Eurus-Holmes/VAG-NMT
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
multimodal-machine-translation-on-multi30kVAG-NMT
BLEU (EN-DE): 31.6
Meteor (EN-DE): 52.2
Meteor (EN-FR): 70.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Visual Attention Grounding Neural Model for Multimodal Machine Translation | Papers | HyperAI