Language Modelling On Wikitext 2
评估指标
Number of params
Test perplexity
Validation perplexity
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Number of params | Test perplexity | Validation perplexity |
---|---|---|---|
improving-neural-language-modeling-via | 35M | 38.65 | 40.27 |
direct-output-connection-for-a-high-rank | 37M | 58.03 | 60.29 |
mogrifier-lstm | 35M | 55.1 | 57.3 |
hydra-a-system-for-large-multi-model-deep | 1542M | 15.17 | 15.69 |
dynamic-evaluation-of-neural-sequence-models | 33M | 44.3 | 46.4 |
alleviating-sequence-information-loss-with | 33M | 64.73 | 67.47 |
improving-neural-language-models-with-a | - | 99.3 | - |
massive-language-models-can-be-accurately | - | 234.77 | - |
language-models-are-unsupervised-multitask | 345M | 22.76 | - |
massive-language-models-can-be-accurately | - | 8.21 | - |
frage-frequency-agnostic-word-representation | 35M | 39.14 | 40.85 |
gradual-learning-of-recurrent-neural-networks | 38M | 40.46 | 42.19 |
learning-associative-inference-using-fast-1 | 37M | 61.65 | 54.48 |
language-models-are-unsupervised-multitask | 762M | 19.93 | - |
deep-residual-output-layers-for-neural | 34M | 61.9 | 64.9 |
language-models-are-unsupervised-multitask | 1542M | 18.34 | - |
regularizing-and-optimizing-lstm-language | 33M | 52.0 | 53.8 |
on-the-state-of-the-art-of-evaluation-in | 24M | 65.9 | 69.3 |
tying-word-vectors-and-word-classifiers-a | - | 87.7 | 92.3 |
breaking-the-softmax-bottleneck-a-high-rank | 35M | 40.68 | 42.41 |
regularizing-and-optimizing-lstm-language | 33M | 65.8 | 68.6 |
breaking-the-softmax-bottleneck-a-high-rank | 35M | 61.45 | 63.88 |
partially-shuffling-the-training-data-to-1 | 35M | 59.98 | 62.38 |
direct-output-connection-for-a-high-rank | 185M | 53.09 | 54.19 |
tying-word-vectors-and-word-classifiers-a | - | 87.0 | 91.5 |
fraternal-dropout | 34M | 64.1 | 66.8 |
partially-shuffling-the-training-data-to-1 | 37M | 57.85 | 60.16 |
egru-event-based-gru-for-activity-sparse | - | 68.9 | - |
language-models-are-unsupervised-multitask | 117M | 29.41 | - |
deep-residual-output-layers-for-neural | 34M | 42.0 | 43.9 |
improved-language-modeling-by-decoding-the | 35M | 40.3 | 42.0 |
massive-language-models-can-be-accurately | - | 8.34 | - |
improving-neural-language-models-with-a | - | 68.9 | - |
massive-language-models-can-be-accurately | - | 8.73 | - |
massive-language-models-can-be-accurately | - | 8.45 | - |
advancing-state-of-the-art-in-language | - | 53.73 | 55.4 |
190409408 | 395M | 34.1 | 37.7 |
mogrifier-lstm | 35M | 38.6 | 40.2 |