4 months ago

Mogrifier LSTM

Gábor Melis; Tomáš Kočiský; Phil Blunsom

Abstract

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

Code Repositories

microcoder-py/mogrifier-lstm

Mentioned in GitHub

RMichaelSwan/MogrifierLSTM

pytorch

Mentioned in GitHub

deepmind/lamb

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
language-modelling-on-enwiki8	LSTM	Bit per Character (BPC): 1.195 Number of params: 48M
language-modelling-on-enwiki8	Mogrifier LSTM	Bit per Character (BPC): 1.146 Number of params: 48M
language-modelling-on-hutter-prize	Mogrifier LSTM	Bit per Character (BPC): 1.122 Number of params: 96M
language-modelling-on-hutter-prize	Mogrifier LSTM + dynamic eval	Bit per Character (BPC): 0.988 Number of params: 96M
language-modelling-on-penn-treebank-character	Mogrifier LSTM + dynamic eval	Bit per Character (BPC): 1.083 Number of params: 24M
language-modelling-on-penn-treebank-character	Mogrifier LSTM	Bit per Character (BPC): 1.120 Number of params: 24M
language-modelling-on-penn-treebank-word	Mogrifier LSTM + dynamic eval	Params: 24M Test perplexity: 44.9 Validation perplexity: 44.8
language-modelling-on-wikitext-2	Mogrifier LSTM	Number of params: 35M Test perplexity: 55.1 Validation perplexity: 57.3
language-modelling-on-wikitext-2	Mogrifier LSTM + dynamic eval	Number of params: 35M Test perplexity: 38.6 Validation perplexity: 40.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Mogrifier LSTM

Gábor Melis; Tomáš Kočiský; Phil Blunsom

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters