HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
语言建模
Language Modelling On Text8
Language Modelling On Text8
评估指标
Bit per Character (BPC)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Bit per Character (BPC)
Paper Title
Repository
td-LSTM (Zhang et al., 2016)
1.63
Architectural Complexity Measures of Recurrent Neural Networks
-
td-LSTM-large
1.49
Architectural Complexity Measures of Recurrent Neural Networks
-
BFN
1.41
Bayesian Flow Networks
Unregularised mLSTM
1.40
Multiplicative LSTM for sequence modelling
BN LSTM
1.36
Recurrent Batch Normalization
LayerNorm HM-LSTM
1.29
Hierarchical Multiscale Recurrent Neural Networks
Large mLSTM +emb +WN +VD
1.27
Multiplicative LSTM for sequence modelling
Large RHN
1.27
Recurrent Highway Networks
Bipartite flows (8 flows)
1.23
Discrete Flows: Invertible Generative Models of Discrete Data
mLSTM + dynamic eval
1.19
Dynamic Evaluation of Neural Sequence Models
12-layer Character Transformer Model
1.18
Character-Level Language Modeling with Deeper Self-Attention
PAR Transformer 24B
1.18
Pay Attention when Required
GAM-RHN-10
1.157
Recurrent Highway Networks with Grouped Auxiliary Memory
-
64-layer Character Transformer Model
1.13
Character-Level Language Modeling with Deeper Self-Attention
12L Transformer + 8K adaptive span
1.11
Adaptive Attention Span in Transformers
BP-Transformer - 12 Layers
1.11
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
All-attention network - 18 layers
1.11
Augmenting Self-attention with Persistent Memory
Transformer-LS (small)
1.09
Long-Short Transformer: Efficient Transformers for Language and Vision
All-attention network - 36 layers
1.08
Augmenting Self-attention with Persistent Memory
Transformer-XL - 24 layers
1.08
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
0 of 24 row(s) selected.
Previous
Next
Language Modelling On Text8 | SOTA | HyperAI超神经