RoBERTa-Winogrande-ft 355M (fine-tuned) | 90.6 | WinoGrande: An Adversarial Winograd Schema Challenge at Scale | |
HASH Layers 10B (0-shot) | 64 | Efficient Language Modeling with Sparse all-MLP | - |
RoBERTa-ft 355M (fine-tuned) | 86.4 | WinoGrande: An Adversarial Winograd Schema Challenge at Scale | |
GPT-3 175B (few-shot, k=32) | 92 | Language Models are Few-Shot Learners | |
Hybrid H3 125M (0-shot, rank classification) | 67 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
H3 125M (0-shot, rank classification) | 51 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
Causal Strength Computation (on ClueWeb12) | 69.9 | - | - |
ST-MoE-32B 269B (fine-tuned) | 99.2 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | - |