Command Palette
Search for a command to run...
Zhilin Yang; Zihang Dai; Ruslan Salakhutdinov; William W. Cohen

Abstract
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-modelling-on-penn-treebank-word | AWD-LSTM-MoS + dynamic eval | Params: 22M Test perplexity: 47.69 Validation perplexity: 48.33 |
| language-modelling-on-penn-treebank-word | AWD-LSTM-MoS | Params: 22M Test perplexity: 54.44 Validation perplexity: 56.54 |
| language-modelling-on-wikitext-2 | AWD-LSTM-MoS + dynamic eval | Number of params: 35M Test perplexity: 40.68 Validation perplexity: 42.41 |
| language-modelling-on-wikitext-2 | AWD-LSTM-MoS | Number of params: 35M Test perplexity: 61.45 Validation perplexity: 63.88 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.