HyperAI超神经

Multi Task Language Understanding On Bbh Alg

评估指标

Average (%)

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称Average (%)
scaling-instruction-finetuned-language-models61.3
evaluating-large-language-models-trained-on73.9
scaling-instruction-finetuned-language-models57.6
scaling-instruction-finetuned-language-models66.5
scaling-instruction-finetuned-language-models38.3
scaling-instruction-finetuned-language-models48.2
scaling-instruction-finetuned-language-models62.2