Command Palette
Search for a command to run...
Gábor Melis; Tomáš Kočiský; Phil Blunsom

Abstract
Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-modelling-on-enwiki8 | LSTM | Bit per Character (BPC): 1.195 Number of params: 48M |
| language-modelling-on-enwiki8 | Mogrifier LSTM | Bit per Character (BPC): 1.146 Number of params: 48M |
| language-modelling-on-hutter-prize | Mogrifier LSTM | Bit per Character (BPC): 1.122 Number of params: 96M |
| language-modelling-on-hutter-prize | Mogrifier LSTM + dynamic eval | Bit per Character (BPC): 0.988 Number of params: 96M |
| language-modelling-on-penn-treebank-character | Mogrifier LSTM + dynamic eval | Bit per Character (BPC): 1.083 Number of params: 24M |
| language-modelling-on-penn-treebank-character | Mogrifier LSTM | Bit per Character (BPC): 1.120 Number of params: 24M |
| language-modelling-on-penn-treebank-word | Mogrifier LSTM + dynamic eval | Params: 24M Test perplexity: 44.9 Validation perplexity: 44.8 |
| language-modelling-on-wikitext-2 | Mogrifier LSTM | Number of params: 35M Test perplexity: 55.1 Validation perplexity: 57.3 |
| language-modelling-on-wikitext-2 | Mogrifier LSTM + dynamic eval | Number of params: 35M Test perplexity: 38.6 Validation perplexity: 40.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.