Command Palette
Search for a command to run...
Armen Aghajanyan; Akshat Shrivastava; Anchit Gupta; Naman Goyal; Luke Zettlemoyer; Sonal Gupta

Abstract
Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| abstractive-text-summarization-on-cnn-daily | BART+R3F | ROUGE-1: 44.38 ROUGE-2: 21.53 ROUGE-L: 41.17 |
| cross-lingual-natural-language-inference-on | XLM-R R4F | Accuracy: 84.7% |
| cross-lingual-natural-language-inference-on-1 | XLM-R R4F | Accuracy: 85.2% |
| cross-lingual-natural-language-inference-on-3 | XLM-R R4F | Accuracy: 84.2% |
| text-summarization-on-gigaword | BART-RXF | ROUGE-1: 40.45 ROUGE-2: 20.69 ROUGE-L: 36.56 |
| text-summarization-on-reddit-tifu | BART+R3F | ROUGE-1: 30.31 ROUGE-2: 10.98 ROUGE-L: 24.74 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.