Command Palette
Search for a command to run...
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
Gholami Mohsen ; Akbari Mohammad ; Hu Cindy ; Masrani Vaden ; Wang Z. Jane ; Zhang Yong

Abstract
Knowledge distillation from LLMs is essential for the efficient deployment oflanguage models. Prior works have proposed data generation using LLMs forpreparing distilled models. We argue that generating data with LLMs is prone tosampling mainly from the center of original content distribution. Thislimitation hinders the distilled model from learning the true underlying datadistribution and to forget the tails of the distributions (samples with lowerprobability). To this end, we propose GOLD, a task-agnostic data generation andknowledge distillation framework, which employs an iterativeout-of-distribution-guided feedback mechanism for the LLM. As a result, thegenerated data improves the generalizability of distilled models. Anenergy-based OOD evaluation approach is also introduced to deal with noisygenerated data. Our extensive experiments on 10 different classification andsequence-to-sequence tasks in NLP show that GOLD respectively outperforms priorarts and the LLM with an average improvement of 5% and 14%. We will also showthat the proposed method is applicable to less explored and novel tasks. Thecode is available.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| data-free-knowledge-distillation-on-qnli | GOLD (T5-base) | Accuracy: 91.7 |
| data-free-knowledge-distillation-on-squad | GOLD (T5-base) | Exact Match: 75.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.