SparseGPT (175B, 2:4 Sparsity) | 76.19 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
FLAN 137B (few-shot, k=10) | 94.7 | Finetuned Language Models Are Zero-Shot Learners | |
SparseGPT (175B, 50% Sparsity) | 78.87 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
GPT-3 Large 760M (zero-shot) | 72.4 | Language Models are Few-Shot Learners | |
SparseGPT (175B, 4:8 Sparsity) | 77.02 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
Memory chains and semantic supervision | 78.7 | - | - |
sMLP – deterministic 9.4B (0-shot) | 74.7 | Efficient Language Modeling with Sparse all-MLP | - |
HASH Layers 10B (0-shot) | 64.7 | Efficient Language Modeling with Sparse all-MLP | - |
Base Layers 10B (0-shot) | 61.4 | Efficient Language Modeling with Sparse all-MLP | - |