Command Palette
Search for a command to run...
Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser
Adhiguna Kuncoro; Miguel Ballesteros; Lingpeng Kong; Chris Dyer; Noah A. Smith

Abstract
We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a "distillation" of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| dependency-parsing-on-penn-treebank | Distilled neural FOG | LAS: 92.06 POS: 97.44 UAS: 94.26 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.