HyperAI超神经

Code Generation On Humaneval

评估指标

Pass@1

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称Pass@1
from-code-to-correctness-closing-the-last96.3
ldb-a-large-language-model-debugger-via98.2
hierarchical-prompting-taxonomy-a-universal100
hierarchical-prompting-taxonomy-a-universal100
aflow-automating-agentic-workflow-generation94.7
codesim-multi-agent-code-generation-and-197.6
模型 792.0
codesim-multi-agent-code-generation-and-198.8
l2mac-large-language-model-automatic-computer90.2
agentcoder-multi-agent-based-code-generation96.3
codesim-multi-agent-code-generation-and-195.1
mapcoder-multi-agent-code-generation-for93.9
octopack-instruction-tuning-code-large86.6
模型 1491.65
ldb-a-large-language-model-debugger-via99.4
qualityflow-an-agentic-workflow-for-program98.8
nexus-a-lightweight-and-scalable-multi-agent98.8
planning-driven-programming-a-large-language98.2
模型 1985.97
claude-3-5-sonnet-model-card-addendum90.2
metagpt-meta-programming-for-multi-agent85.9