HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
语言建模
Language Modelling On Penn Treebank Word
Language Modelling On Penn Treebank Word
评估指标
Params
Test perplexity
Validation perplexity
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Params
Test perplexity
Validation perplexity
Paper Title
Repository
TCN
14.7M
108.47
-
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Seq-U-Net
14.9M
107.95
-
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
GRU (Bai et al., 2018)
-
92.48
-
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
R-Transformer
-
84.38
-
R-Transformer: Recurrent Neural Network Enhanced Transformer
Zaremba et al. (2014) - LSTM (medium)
-
82.7
86.2
Recurrent Neural Network Regularization
Gal & Ghahramani (2016) - Variational LSTM (medium)
-
79.7
81.9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM (Bai et al., 2018)
-
78.93
-
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Zaremba et al. (2014) - LSTM (large)
-
78.4
82.2
Recurrent Neural Network Regularization
Gal & Ghahramani (2016) - Variational LSTM (large)
-
75.2
77.9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Inan et al. (2016) - Variational RHN
-
66.0
68.1
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Recurrent highway networks
23M
65.4
67.9
Recurrent Highway Networks
NAS-RL
25M
64.0
-
Neural Architecture Search with Reinforcement Learning
Efficient NAS
24M
58.6
60.8
Efficient Neural Architecture Search via Parameter Sharing
AWD-LSTM
24M
57.3
60.0
Regularizing and Optimizing LSTM Language Models
DEQ-TrellisNet
24M
57.1
-
Deep Equilibrium Models
AWD-LSTM 3-layer with Fraternal dropout
24M
56.8
58.9
Fraternal Dropout
Dense IndRNN
-
56.37
-
Deep Independently Recurrent Neural Network (IndRNN)
Differentiable NAS
23M
56.1
58.3
DARTS: Differentiable Architecture Search
AWD-LSTM-DRILL
24M
55.7
58.2
Deep Residual Output Layers for Neural Language Generation
2-layer skip-LSTM + dropout tuning
24M
55.3
57.1
Pushing the bounds of dropout
0 of 43 row(s) selected.
Previous
Next