HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Jack W. Rae; Sebastian Borgeaud; Trevor Cai; Katie Millican; Jordan Hoffmann; Francis Song; John Aslanides; Sarah Henderson; Roman Ring; Susannah Young; Eliza Rutherford; Tom Hennigan; Jacob Menick; Albin Cassirer; Richard Powell; George van den Driessche; Lisa Anne Hendricks; Maribeth Rauh; Po-Sen Huang; Amelia Glaese; Johannes Welbl; Sumanth Dathathri; Saffron Huang; Jonathan Uesato; John Mellor; Irina Higgins; Antonia Creswell; Nat McAleese; Amy Wu; Erich Elsen; Siddhant Jayakumar; Elena Buchatskaya; David Budden; Esme Sutherland; Karen Simonyan; Michela Paganini; Laurent Sifre; Lena Martens; Xiang Lorraine Li; Adhiguna Kuncoro; Aida Nematzadeh; Elena Gribovskaya; Domenic Donato; Angeliki Lazaridou; Arthur Mensch; Jean-Baptiste Lespiau; Maria Tsimpoukelli; Nikolai Grigorev; Doug Fritz; Thibault Sottiaux; Mantas Pajarskas; Toby Pohlen; Zhitao Gong; Daniel Toyama; Cyprien de Masson d'Autume; Yujia Li; Tayfun Terzi; Vladimir Mikulik; Igor Babuschkin; Aidan Clark; Diego de Las Casas; Aurelia Guy; Chris Jones; James Bradbury; Matthew Johnson; Blake Hechtman; Laura Weidinger; Iason Gabriel; William Isaac; Ed Lockhart; Simon Osindero; Laura Rimell; Chris Dyer; Oriol Vinyals; Kareem Ayoub; Jeff Stanway; Lorrayne Bennett; Demis Hassabis; Koray Kavukcuoglu; Geoffrey Irving

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Abstract

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

Code Repositories

allenai/dolma
Mentioned in GitHub
bramiozo/PubScience
Mentioned in GitHub
rvlopes/gloria
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
abstract-algebra-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 25.0
analogical-similarity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 17.2
analytic-entailment-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 53.0
anatomy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 56.3
astronomy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 65.8
business-ethics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 70.0
clinical-knowledge-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 67.2
college-biology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 70.8
college-chemistry-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 45.0
college-computer-science-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 49
college-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 37.0
college-medicine-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 60.1
college-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 34.3
common-sense-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 45.5
common-sense-reasoning-on-big-bench-causalGopher-280B (few-shot, k=5)
Accuracy: 50.8
common-sense-reasoning-on-big-bench-dateGopher-280B (few-shot, k=5)
Accuracy: 44.1
common-sense-reasoning-on-big-bench-knownGopher-280B (few-shot, k=5)
Accuracy: 63.6
common-sense-reasoning-on-big-bench-logicalGopher-280B (few-shot, k=5)
Accuracy: 36.4
common-sense-reasoning-on-big-bench-sportsGopher-280B (few-shot, k=5)
Accuracy: 54.9
common-sense-reasoning-on-big-bench-winowhyGopher-280B (few-shot, k=5)
Accuracy: 56.7
common-sense-reasoning-on-winograndeGopher 280B (0-shot)
Accuracy: 70.1
computer-security-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 65.0
conceptual-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 49.4
crash-blossom-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 63.6
crass-ai-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.8
dark-humor-detection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 83.1
discourse-marker-prediction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 11.7
econometrics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 43
electrical-engineering-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 60
elementary-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 33.6
empirical-judgments-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 52.5
english-proverbs-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 57.6
entailed-polarity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 89.5
epistemic-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.4
evaluating-information-essentiality-on-bigGopher-280B (few-shot, k=5)
Accuracy: 16.7
fantasy-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 64.1
fever-2-way-on-big-benchGopher-280B (few-shot, k=10)
Accuracy: 77.5
fever-3-way-on-big-benchGopher-280B (few-shot, k=15)
Accuracy: 77.5
figure-of-speech-detection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 52.7
formal-logic-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 35.7
general-knowledge-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 93.9
global-facts-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 38.0
gre-reading-comprehension-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 27.3
high-school-biology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 71.3
high-school-chemistry-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 47.8
high-school-computer-science-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 54.0
high-school-european-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 72.1
high-school-geography-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 76.8
high-school-government-and-politics-on-bigGopher-280B (few-shot, k=5)
Accuracy : 83.9
high-school-macroeconomics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 65.1
high-school-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 23.7
high-school-microeconomics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 66.4
high-school-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 33.8
high-school-psychology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.8
high-school-statistics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 50
high-school-us-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 78.9
high-school-world-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 75.1
human-aging-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 66.4
human-organs-senses-multiple-choice-on-bigGopher-280B (few-shot, k=5)
Accuracy : 84.8
human-sexuality-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 67.2
identify-odd-metapor-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 38.6
implicatures-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 62.0
implicit-relations-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 36.4
intent-recognition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 88.7
international-law-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 77.7
irony-identification-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.7
jurisprudence-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 71.3
lambada-on-big-benchGopher-280B (zero-shot)
Accuracy : 74.5
language-modelling-on-arxivGopher
BPB: 0.662
language-modelling-on-bookcorpus2Gopher
BPB: 0.741
language-modelling-on-books3Gopher
BPB: 0.712
language-modelling-on-curation-corpusGopher
BPB: 0.475
language-modelling-on-dm-mathematicsGopher
BPB: 1.14
language-modelling-on-freelawGopher
BPB: 0.513
language-modelling-on-githubGopher
BPB: 0.377
language-modelling-on-gutenberg-pg-19Gopher
BPB: 0.656
language-modelling-on-hackernewsGopher
BPB: 0.890
language-modelling-on-nih-exporterGopher
BPB: 0.590
language-modelling-on-opensubtitlesGopher
BPB: 0.899
language-modelling-on-openwebtext2Gopher
BPB: 0.677
language-modelling-on-philpapersGopher
BPB: 0.695
language-modelling-on-pile-ccGopher
BPB: 0.691
language-modelling-on-pubmed-abstractsGopher
BPB: 0.577
language-modelling-on-pubmed-centralGopher
BPB: 0.525
language-modelling-on-stackexchangeGopher
BPB: 0.641
language-modelling-on-ubuntu-ircGopher
BPB: 1.09
language-modelling-on-uspto-backgroundsGopher
BPB: 0.546
logical-args-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 59.1
logical-fallacies-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 72.4
logical-reasoning-on-big-bench-formalGopher-280B (few-shot, k=5)
Accuracy: 50.7
logical-reasoning-on-big-bench-logic-gridGopher-280B (few-shot, k=5)
Accuracy: 35.1
logical-reasoning-on-big-bench-logicalGopher-280B (few-shot, k=5)
Accuracy: 58.9
logical-reasoning-on-big-bench-penguins-in-aGopher-280B (few-shot, k=5)
Accuracy: 40.6
logical-reasoning-on-big-bench-reasoningGopher-280B (few-shot, k=5)
Accuracy: 49.2
logical-reasoning-on-big-bench-strategyqaGopher-280B (few-shot, k=5)
Accuracy: 61.0
logical-reasoning-on-big-bench-temporalGopher-280B (few-shot, k=5)
Accuracy: 19.0
machine-learning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 41.1
management-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 77.7
marketing-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 83.3
mathematical-induction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 57.6
medical-genetics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.0
memorization-on-big-bench-hindu-knowledgeGopher-280B (few-shot, k=5)
Accuracy: 80
metaphor-boolean-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 59.3
miscellaneous-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 75.7
misconceptions-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 61.7
moral-disputes-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 66.8
moral-permissibility-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 55.1
moral-scenarios-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 40.2
movie-dialog-same-or-different-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 50.7
multi-task-language-understanding-on-mmluGopher 7.1B (5-shot)
Average (%): 29.5
multiple-choice-question-answering-mcqa-on-27Gopher-280B (few-shot, k=5)
Accuracy: 51.7
multiple-choice-question-answering-mcqa-on-28Gopher-280B (few-shot, k=5)
Accuracy: 50.5
multiple-choice-question-answering-mcqa-on-29Gopher-280B (few-shot, k=5)
Accuracy: 51.1
multiple-choice-question-answering-mcqa-on-30Gopher-280B (few-shot, k=5)
Accuracy: 38.6
multiple-choice-question-answering-mcqa-on-31Gopher-280B (few-shot, k=5)
Accuracy: 59.1
natural-questions-on-big-benchGopher-280B (few-shot, k=64)
Accuracy: 28.2
nonsense-words-grammar-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 61.4
nutrition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 69.9
odd-one-out-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 32.5
philosophy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 68.8
phrase-relatedness-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.8
physical-intuition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 59.7
physics-mc-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 50.9
prehistory-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 67.6
presuppositions-as-nli-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 34.0
professional-accounting-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 44.3
professional-law-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 44.5
professional-medicine-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 64.0
professional-psychology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 68.1
public-relations-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 71.8
question-answering-on-boolqGopher (zero-shot)
Accuracy: 79.3
question-answering-on-natural-questionsGopher (few-shot, k=64)
EM: 28.2
question-answering-on-piqaGopher 280B (0-shot)
Accuracy: 81.8
question-answering-on-social-iqaGopher (zero-shot)
Accuracy: 50.6
question-answering-on-truthfulqaGopher 280B (zero-shot, QA prompts)
MC1: 0. 27
question-answering-on-truthfulqaGopher 7.1 (zero-shot, QA prompts)
MC1: 0.25
question-answering-on-truthfulqaGopher 7.1B (zero-shot, Our Prompt + Choices)
MC1: 0.23
question-answering-on-truthfulqaGopher 1.4 (zero-shot, QA prompts)
MC1: 0.23
question-answering-on-truthfulqaGopher 280B (zero-shot, Our Prompt + Choices)
MC1: 0.295
question-answering-on-truthfulqaGopher 1.4B (zero-shot, Our Prompt + Choices)
MC1: 0.217
question-selection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 41.4
race-h-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 71.6
race-m-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 75.1
riddle-sense-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 68.2
sarcasm-detection-on-big-bench-snarksGopher-280B (few-shot, k=5)
Accuracy: 48.3
security-studies-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 64.9
sentence-ambiguity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.1
similarities-abstraction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 81.8
sociology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 84.1
timedial-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 50.9
triviaqa-on-big-benchGopher-280B (few-shot, k=64)
Accuracy: 57.1
understanding-fables-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 39.6
us-foreign-policy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.0
virology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 47.0
word-sense-disambiguation-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.4
world-religions-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 84.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp