HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multimodal Transformer for Multimodal Machine Translation

{Xiaojun Wan Shaowei Yao}

Multimodal Transformer for Multimodal Machine Translation

Abstract

Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality. Previous works propose various incorporation methods, but most of them do not consider the relative importance of multiple modalities. Equally treating all modalities may encode too much useless information from less important modalities. In this paper, we introduce the multimodal self-attention in Transformer to solve the issues above in MMT. The proposed method learns the representation of images based on the text, which avoids encoding irrelevant information in images. Experiments and visualization analysis demonstrate that our model benefits from visual information and substantially outperforms previous works and competitive baselines in terms of various metrics.

Benchmarks

BenchmarkMethodologyMetrics
multimodal-machine-translation-on-multi30kMultimodal Transformer
BLEU (EN-DE): 38.7
Meteor (EN-DE): 55.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal Transformer for Multimodal Machine Translation | Papers | HyperAI