Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Language Modelling

Language modeling is the task of predicting the next word or character in a document, and trained language models can be applied to various natural language processing tasks such as text generation, text classification, and question answering. Since the 2010s, neural language models have replaced N-gram models, and after the 2020s, large language models (LLMs) have become the sole path to achieving state-of-the-art performance. The capabilities of these models are evaluated using metrics like cross-entropy and perplexity, with common datasets including WikiText-103, One Billion Word, Text8, C4, and The Pile.

Penn Treebank (Word Level)

GPT-3 (Zero-Shot)

GPT-2 (48 layers, h=1600)

Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)

SparseGPT (175B, 50% Sparsity)

GPT-3 175B (Few-Shot)

One Billion Word

OmniNetT (Large)

Penn Treebank (Character Level)

Mogrifier LSTM + dynamic eval

Transformer-XL + RMS dynamic eval

Spirit-LM (Expr.)

GLM-130B (3-shot)

CLUE (CMRC2018)

CLUE (OCNLI_50K)

FewCLUE (BUSTM)

FewCLUE (CHID-FC)

FewCLUE (CLUEWSC-FC)

FewCLUE (EPRSTMT)

FewCLUE (OCNLI-FC)

Hybrid 4-gram VietMed-Train + ExtraText

Ethereum Phishing Transaction Network

100 sleep nights of 8 caregivers

2000 HUB5 English

Arxiv HEP-TH citation graph

Curation Corpus

Transformer-LS (small)

PAR Transformer 24B

Gutenberg PG-19

language-modeling-recommendation

PTB Diagnostic ECG Database

PubMed Cognitive Control Abstracts

Transformer-LS (small)

USPTO Backgrounds