Command Palette
Search for a command to run...
Armen Aghajanyan Anchit Gupta Akshat Shrivastava Xilun Chen Luke Zettlemoyer Sonal Gupta

Abstract
We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| abstractive-text-summarization-on-cnn-daily | MUPPET BART Large | ROUGE-1: 44.45 ROUGE-2: 21.25 ROUGE-L: 41.4 |
| common-sense-reasoning-on-commonsenseqa | MUPPET Roberta Large | Accuracy: 79.2 |
| natural-language-inference-on-rte | MUPPET Roberta Large | Accuracy: 92.8% |
| question-answering-on-boolq | MUPPET Roberta Base | Accuracy: 83.8 |
| question-answering-on-boolq | MUPPET Roberta Large | Accuracy: 87.5 |
| sentiment-analysis-on-sst-2-binary | MUPPET Roberta base | Accuracy: 96.7 |
| sentiment-analysis-on-sst-2-binary | MUPPET Roberta Large | Accuracy: 97.4 |
| text-summarization-on-gigaword | MUPPET BART Large | ROUGE-1: 40.4 ROUGE-2: 20.54 ROUGE-L: 36.21 |
| text-summarization-on-reddit-tifu | MUPPET BART Large | ROUGE-1: 30.3 ROUGE-2: 11.25 ROUGE-L: 24.92 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.