HyperAI超神经

Natural Language Inference On Anli Test

评估指标

A1
A2
A3

评测结果

各个模型在此基准测试上的表现结果

模型名称
A1
A2
A3
Paper TitleRepository
ChatGPT62.352.654.1A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
GPT-336.83440.2Language Models are Few-Shot Learners
PaLM 2-S (one-shot)53.148.853.2PaLM 2 Technical Report
T0-11B (explanation prompting)75.660.659.9Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
KiC-770M36.3035.0037.60Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models-
BLOOM 176B (one-shot)33.633.835.17BloombergGPT: A Large Language Model for Finance-
OPT 66B (one-shot)33.134.234.92BloombergGPT: A Large Language Model for Finance-
PaLM 540B (Self Consistency)-64.563.4Large Language Models Can Self-Improve-
PaLM 540B (Self Improvement, Self Consistency)-66.567.9Large Language Models Can Self-Improve-
RoE-3B35.4934.6431.22Exploring the Benefits of Training Expert Language Models over Instruction Tuning
GPT-NeoX (one-shot)32.633.836.17BloombergGPT: A Large Language Model for Finance-
T5-3B (explanation prompting)81.872.574.8Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
InfoBERT (RoBERTa)7550.547.7InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
PaLM 2-L (one-shot)73.163.467.1PaLM 2 Technical Report
T0-3B (CoT fine-tuned)41.737.241.9The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning-
PaLM 540B (CoT Prompting)-58.960.6Large Language Models Can Self-Improve-
RoBERTa (Large)72.449.844.4RoBERTa: A Robustly Optimized BERT Pretraining Approach
PaLM 540B (Self Improvement, Standard-Prompting)-64.866.9Large Language Models Can Self-Improve-
Bloomberg GPT (one-shot)32.934.437.33BloombergGPT: A Large Language Model for Finance-
XLNet (Large)70.350.949.4XLNet: Generalized Autoregressive Pretraining for Language Understanding
0 of 25 row(s) selected.