HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Elias Frantar Dan Alistarh

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Abstract

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Code Repositories

baithebest/sparsellm
pytorch
Mentioned in GitHub
baithebest/adagp
pytorch
Mentioned in GitHub
eth-easl/deltazip
pytorch
Mentioned in GitHub
nvlabs/maskllm
pytorch
Mentioned in GitHub
ist-daslab/sparsegpt
Official
pytorch
Mentioned in GitHub
nvidia/tensorrt-model-optimizer
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
common-sense-reasoning-on-arc-challengeOPT-175B (50% Sparsity)
Accuracy: 25.6
common-sense-reasoning-on-arc-challengeOPT-175B
Accuracy: 43.94
common-sense-reasoning-on-arc-challengeSparseGPT (175B, 2:4 Sparsity)
Accuracy: 38.99
common-sense-reasoning-on-arc-challengeSparseGPT (175B, 50% Sparsity)
Accuracy: 41.3
common-sense-reasoning-on-arc-challengeSparseGPT (175B, 4:8 Sparsity)
Accuracy: 39.85
common-sense-reasoning-on-arc-easySparseGPT 175B (50% sparsity)
Accuracy: 69.65
common-sense-reasoning-on-arc-easySparseGPT (175B, 4:8 Sparsity)
Accuracy: 68.35
common-sense-reasoning-on-arc-easyOPT-175B
Accuracy: 71.04
common-sense-reasoning-on-arc-easySparseGPT 175B (2:4 sparsity)
Accuracy: 67.08
common-sense-reasoning-on-arc-easyOPT 175B (50% Sparsity)
Accuracy: 28.03
language-modelling-on-lambadaOPT-175B (50% Sparsity)
Accuracy: 0.02
language-modelling-on-lambadaSparseGPT (175B, 2:4 Sparsity)
Accuracy: 79.47
language-modelling-on-lambadaSparseGPT (175B, 50% Sparsity)
Accuracy: 76.51
language-modelling-on-lambadaOPT-175B
Accuracy: 75.59
language-modelling-on-lambadaSparseGPT (175B, 4:8 Sparsity)
Accuracy: 78.77
language-modelling-on-wikitext-2OPT-175B (50% Sparsity)
Test perplexity: 234.77
language-modelling-on-wikitext-2SparseGPT (175B, 50% Sparsity)
Test perplexity: 8.21
language-modelling-on-wikitext-2OPT-175B
Test perplexity: 8.34
language-modelling-on-wikitext-2SparseGPT (175B, 2:4 Sparsity)
Test perplexity: 8.73
language-modelling-on-wikitext-2SparseGPT (175B, 4:8 Sparsity)
Test perplexity: 8.45
question-answering-on-piqaSparseGPT 175B (50% Sparsity)
Accuracy: 80.63
question-answering-on-piqaOPT-175B (50% Sparsity)
Accuracy: 54.73
question-answering-on-piqaOPT-175B
Accuracy: 81.07
question-answering-on-piqaSparseGPT 175B (4:8 Sparsity)
Accuracy: 79.54
question-answering-on-piqaSparseGPT 175B (2:4 Sparsity)
Accuracy: 79.54
question-answering-on-storyclozeSparseGPT (175B, 2:4 Sparsity)
Accuracy: 76.19
question-answering-on-storyclozeSparseGPT (175B, 50% Sparsity)
Accuracy: 78.87
question-answering-on-storyclozeSparseGPT (175B, 4:8 Sparsity)
Accuracy: 77.02
question-answering-on-storyclozeOPT-175B
Accuracy: 79.82
question-answering-on-storyclozeOPT-175B (50% Sparsity)
Accuracy: 47.10

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp