Command Palette
Search for a command to run...
Rafal Jozefowicz; Oriol Vinyals; Mike Schuster; Noam Shazeer; Yonghui Wu

Abstract
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-modelling-on-one-billion-word | LSTM-8192-1024 + CNN Input | Number of params: 1.04B PPL: 30.0 |
| language-modelling-on-one-billion-word | LSTM-8192-1024 | Number of params: 1.8B PPL: 30.6 |
| language-modelling-on-one-billion-word | 10 LSTM+CNN inputs + SNM10-SKIP (ensemble) | Number of params: 43B PPL: 23.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.