Question Answering On Drop
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
PaLM 540B (Self Consistency) | 78.2 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Self Consistency) | 83 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, Standard-Prompting) | 71.7 | Large Language Models Can Self-Improve | - |
PaLM 540B (Standard-Prompting) | 60 | Large Language Models Can Self-Improve | - |
PaLM 540B (CoT Prompting) | 70.6 | Large Language Models Can Self-Improve | - |
PaLM 540B (Self Improvement, CoT Prompting) | 76.2 | Large Language Models Can Self-Improve | - |
0 of 6 row(s) selected.