HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau; Philemon Brakel; Kelvin Xu; Anirudh Goyal; Ryan Lowe; Joelle Pineau; Aaron Courville; Yoshua Bengio

An Actor-Critic Algorithm for Sequence Prediction

Abstract

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

Code Repositories

juliakreutzer/joeynmt
pytorch
Mentioned in GitHub
joeynmt/joeynmt
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-iwslt2014-germanActor-Critic [Bahdanau2017]
BLEU score: 28.53
machine-translation-on-iwslt2015-englishRNNsearch
BLEU score: 25.04
machine-translation-on-iwslt2015-germanRNNsearch
BLEU score: 29.98

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp