4 months ago

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang; Zihang Dai; Ruslan Salakhutdinov; William W. Cohen

Abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Code Repositories

yfreedomliTHU/mos-pytorch1.1

pytorch

Mentioned in GitHub

omerlux/NLP-PTB

pytorch

Mentioned in GitHub

cstorm125/thai2fit

pytorch

Mentioned in GitHub

nunezpaul/MNIST

Mentioned in GitHub

zhangyaoyuan/GAN-Simplification

Mentioned in GitHub

nkcr/overlap-ml

pytorch

Mentioned in GitHub

omerlux/Recurrent_Neural_Network_-_Part_2

Mentioned in GitHub

tdmeeste/SparseSeqModels

pytorch

Mentioned in GitHub

zihangdai/mos

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
language-modelling-on-penn-treebank-word	AWD-LSTM-MoS + dynamic eval	Params: 22M Test perplexity: 47.69 Validation perplexity: 48.33
language-modelling-on-penn-treebank-word	AWD-LSTM-MoS	Params: 22M Test perplexity: 54.44 Validation perplexity: 56.54
language-modelling-on-wikitext-2	AWD-LSTM-MoS + dynamic eval	Number of params: 35M Test perplexity: 40.68 Validation perplexity: 42.41
language-modelling-on-wikitext-2	AWD-LSTM-MoS	Number of params: 35M Test perplexity: 61.45 Validation perplexity: 63.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang; Zihang Dai; Ruslan Salakhutdinov; William W. Cohen

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters