Chinchilla (zero-shot) | 51.3 | Training Compute-Optimal Large Language Models | - |
RoBERTa-Large 355M (fine-tuned) | 76.7 | RoBERTa: A Robustly Optimized BERT Pretraining Approach | - |
CompassMTL 567M with Tailor | 82.2 | Task Compass: Scaling Multi-task Pre-training with Task Prefix | - |
LLaMA-3 8B+MoSLoRA (fine-tuned) | 81.0 | Mixture-of-Subspaces in Low-Rank Adaptation | - |
BERT-base 110M (fine-tuned) | 63.1 | SocialIQA: Commonsense Reasoning about Social Interactions | - |
BERT-large 340M (fine-tuned) | 64.5 | SocialIQA: Commonsense Reasoning about Social Interactions | - |