HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
语言建模
Language Modelling On Wikitext 2
Language Modelling On Wikitext 2
评估指标
Number of params
Test perplexity
Validation perplexity
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Number of params
Test perplexity
Validation perplexity
Paper Title
Repository
OPT-175B (50% Sparsity)
-
234.77
-
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Grave et al. (2016) - LSTM
-
99.3
-
Improving Neural Language Models with a Continuous Cache
Inan et al. (2016) - Variational LSTM (tied) (h=650)
-
87.7
92.3
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
-
87.0
91.5
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
EGRU
-
68.9
-
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
Grave et al. (2016) - LSTM + continuous cache pointer
-
68.9
-
Improving Neural Language Models with a Continuous Cache
Melis et al. (2017) - 1-layer LSTM (tied)
24M
65.9
69.3
On the State of the Art of Evaluation in Neural Language Models
AWD-LSTM
33M
65.8
68.6
Regularizing and Optimizing LSTM Language Models
AWD-LSTM + ATOI
33M
64.73
67.47
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
AWD-LSTM 3-layer with Fraternal dropout
34M
64.1
66.8
Fraternal Dropout
AWD-LSTM-DRILL
34M
61.9
64.9
Deep Residual Output Layers for Neural Language Generation
AWD-FWM Schlag et al. (2020)
37M
61.65
54.48
Learning Associative Inference Using Fast Weight Memory
AWD-LSTM-MoS
35M
61.45
63.88
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
AWD-LSTM-MoS + Partial Shuffle
35M
59.98
62.38
Partially Shuffling the Training Data to Improve Language Models
AWD-LSTM-DOC
37M
58.03
60.29
Direct Output Connection for a High-Rank Language Model
AWD-LSTM-DOC + Partial Shuffle
37M
57.85
60.16
Partially Shuffling the Training Data to Improve Language Models
Mogrifier LSTM
35M
55.1
57.3
Mogrifier LSTM
Ensemble of All
-
53.73
55.4
Advancing State of the Art in Language Modeling
AWD-LSTM-DOC x5
185M
53.09
54.19
Direct Output Connection for a High-Rank Language Model
AWD-LSTM + continuous cache pointer
33M
52.0
53.8
Regularizing and Optimizing LSTM Language Models
0 of 38 row(s) selected.
Previous
Next