Multi Task Language Understanding On Mgsm
评估指标
Average (%)
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Average (%) |
---|---|
transcending-scaling-laws-with-0-1-extra | 49.9 |
palm-scaling-language-modeling-with-pathways-1 | 55.0 |
palm-2-technical-report-1 | 87.0 |
scaling-instruction-finetuned-language-models | 60.4 |
scaling-instruction-finetuned-language-models | 72.0 |
scaling-instruction-finetuned-language-models | 35 |
scaling-instruction-finetuned-language-models | 57.0 |
scaling-instruction-finetuned-language-models | 5.7 |
scaling-instruction-finetuned-language-models | 36 |
scaling-instruction-finetuned-language-models | 21.2 |
scaling-instruction-finetuned-language-models | 23.7 |
palm-2-technical-report-1 | 72.2 |