ELC-BERT-base 98M (zero init) | 84.4 | 84.5 | Not all layers are equally as important: Every Layer Counts BERT | - |
Snorkel MeTaL (ensemble) | 87.6 | 87.2 | Training Complex Models with Multi-Task Weak Supervision | |
GPST(unsupervised generative syntactic LM) | 81.8 | 82.0 | Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | |