Natural Language Inference On Anli Test

评估指标

评测结果

各个模型在此基准测试上的表现结果

				Paper Title	Repository
T5-3B (explanation prompting)	81.8	72.5	74.8	Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}	-
PaLM 540B (Self Improvement, Self Consistency)	-	66.5	67.9	Large Language Models Can Self-Improve	-
PaLM 540B (Self Improvement, CoT Prompting)	-	65.3	67.3	Large Language Models Can Self-Improve	-
PaLM 540B (Self Improvement, Standard-Prompting)	-	64.8	66.9	Large Language Models Can Self-Improve	-
PaLM 540B (Self Consistency)	-	64.5	63.4	Large Language Models Can Self-Improve	-
PaLM 2-L (one-shot)	73.1	63.4	67.1	PaLM 2 Technical Report
T0-11B (explanation prompting)	75.6	60.6	59.9	Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}	-
PaLM 540B (CoT Prompting)	-	58.9	60.6	Large Language Models Can Self-Improve	-
PaLM 540B (Standard-Prompting)	-	55.8	55.8	Large Language Models Can Self-Improve	-
ChatGPT	62.3	52.6	54.1	A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
ALUM (RoBERTa-LARGE)	72.3	52.1	48.4	Adversarial Training for Large Neural Language Models
XLNet (Large)	70.3	50.9	49.4	XLNet: Generalized Autoregressive Pretraining for Language Understanding
InfoBERT (RoBERTa)	75	50.5	47.7	InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
RoBERTa (Large)	72.4	49.8	44.4	RoBERTa: A Robustly Optimized BERT Pretraining Approach
PaLM 2-M (one-shot)	58.1	49.5	54.5	PaLM 2 Technical Report
PaLM 2-S (one-shot)	53.1	48.8	53.2	PaLM 2 Technical Report
T0-3B (CoT fine-tuned)	41.7	37.2	41.9	The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Flipped-3B	39.99	37.05	37.73	Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
KiC-770M	36.30	35.00	37.60	Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models	-
RoE-3B	35.49	34.64	31.22	Exploring the Benefits of Training Expert Language Models over Instruction Tuning

0 of 25 row(s) selected.

Command Palette

Natural Language Inference On Anli Test

评估指标

评测结果