5 months ago

Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan; Akshat Shrivastava; Anchit Gupta; Naman Goyal; Luke Zettlemoyer; Sonal Gupta

Abstract

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

Code Repositories

cosmoquester/2021-dialogue-summary-competition

pytorch

Mentioned in GitHub

cliang1453/camero

pytorch

Mentioned in GitHub

pytorch/fairseq/tree/master/examples/rxf

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
abstractive-text-summarization-on-cnn-daily	BART+R3F	ROUGE-1: 44.38 ROUGE-2: 21.53 ROUGE-L: 41.17
cross-lingual-natural-language-inference-on	XLM-R R4F	Accuracy: 84.7%
cross-lingual-natural-language-inference-on-1	XLM-R R4F	Accuracy: 85.2%
cross-lingual-natural-language-inference-on-3	XLM-R R4F	Accuracy: 84.2%
text-summarization-on-gigaword	BART-RXF	ROUGE-1: 40.45 ROUGE-2: 20.69 ROUGE-L: 36.56
text-summarization-on-reddit-tifu	BART+R3F	ROUGE-1: 30.31 ROUGE-2: 10.98 ROUGE-L: 24.74

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan; Akshat Shrivastava; Anchit Gupta; Naman Goyal; Luke Zettlemoyer; Sonal Gupta

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters