Natural Language Inference On Multinli

评估指标

Matched

Mismatched

评测结果

各个模型在此基准测试上的表现结果

模型名称	Matched	Mismatched	Paper Title	Repository
ERNIE 2.0 Large	88.7	88.8	ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
T5-Base	87.1	86.2	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
ELC-BERT-base 98M (zero init)	84.4	84.5	Not all layers are equally as important: Every Layer Counts BERT	-
Snorkel MeTaL (ensemble)	87.6	87.2	Training Complex Models with Multi-Task Weak Supervision
T5-3B	91.4	91.2	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GenSen	71.4	71.3	Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
RoBERTa	90.8	-	RoBERTa: A Robustly Optimized BERT Pretraining Approach
Charformer-Tall	83.7	84.4	Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
GPST(unsupervised generative syntactic LM)	81.8	82.0	Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
LM-CPPF RoBERTa-base	-	-	LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
T5-11B	-	91.7	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
SMART+BERT-BASE	-	-	SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
T5-XXL 11B (fine-tuned)	92.0	-	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
RealFormer	86.28	86.34	RealFormer: Transformer Likes Residual Attention
TinyBERT-6 67M	84.6	83.2	TinyBERT: Distilling BERT for Natural Language Understanding
Multi-task BiLSTM + Attn	72.2	72.1	GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Bi-LSTM sentence encoder (max-pooling)	70.7	71.1	Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Adv-RoBERTa ensemble	91.1	90.7	StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding	-
GPT-2-XL 1.5B	36.5	37	LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
UnitedSynT5 (335M)	89.8	-	First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI	-

0 of 67 row(s) selected.