HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Language Models are Unsupervised Multitask Learners

{Jeffrey Wu Rewon Child Ilya Sutskever David Luan Alec Radford Dario Amodei}

Abstract

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typicallyapproached with supervised learning on taskspecific datasets. We demonstrate that languagemodels begin to learn these tasks without any explicit supervision when trained on a new datasetof millions of webpages called WebText. Whenconditioned on a document plus questions, the answers generated by the language model reach 55F1 on the CoQA dataset - matching or exceedingthe performance of 3 out of 4 baseline systemswithout using the 127,000+ training examples.The capacity of the language model is essentialto the success of zero-shot task transfer and increasing it improves performance in a log-linearfashion across tasks. Our largest model, GPT-2,is a 1.5B parameter Transformer that achievesstate of the art results on 7 out of 8 tested language modeling datasets in a zero-shot settingbut still underfits WebText. Samples from themodel reflect these improvements and contain coherent paragraphs of text. These findings suggesta promising path towards building language processing systems which learn to perform tasks fromtheir naturally occurring demonstrations.

Benchmarks

BenchmarkMethodologyMetrics
coreference-resolution-on-winograd-schemaGPT-2-XL 1.5B
Accuracy: 70.7
dialogue-state-tracking-on-simmc2-0GPT-2
Act F1: 94.5
Slot F1: 81.7
document-summarization-on-cnn-daily-mailGPT-2
ROUGE-1: 29.34
ROUGE-2: 8.27
ROUGE-L: 26.58
language-modelling-on-enwiki8GPT-2 (48 layers, h=1600)
Bit per Character (BPC): 0.93
Number of params: 1542M
language-modelling-on-lambadaGPT-2 1.5B (Zero Shot)
Accuracy: 63.24
Perplexity: 8.63
language-modelling-on-one-billion-wordGPT-2
Number of params: 1.54B
PPL: 42.16
language-modelling-on-penn-treebank-wordGPT-2
Params: 1542M
Test perplexity: 35.76
language-modelling-on-text8GPT-2
Bit per Character (BPC): 0.98
Number of params: 1542M
language-modelling-on-wikitext-103GPT-2 Large
Number of params: 774M
Test perplexity: 22.05
language-modelling-on-wikitext-103GPT-2 Small
Number of params: 124M
Test perplexity: 37.50
language-modelling-on-wikitext-103GPT-2 Full
Number of params: 1542M
Test perplexity: 17.48
language-modelling-on-wikitext-103GPT-2 Medium
Number of params: 355M
Test perplexity: 26.37
language-modelling-on-wikitext-2GPT-2 (medium)
Number of params: 345M
Test perplexity: 22.76
language-modelling-on-wikitext-2GPT-2 (large)
Number of params: 762M
Test perplexity: 19.93
language-modelling-on-wikitext-2GPT-2
Number of params: 1542M
Test perplexity: 18.34
language-modelling-on-wikitext-2GPT-2 (small)
Number of params: 117M
Test perplexity: 29.41
question-answering-on-feverZero-shot
EM: 50
question-answering-on-webquestionsZero-shot
EM: 43
response-generation-on-simmc2-0GPT-2
BLEU: 19.2
sentiment-analysis-on-imdbGPT-2 Finetuned
Accuracy: 92.36
text-generation-on-openwebtextGPT2-124M
eval_loss: 3.12

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp