HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
代码生成
Code Generation On Mbpp
Code Generation On Mbpp
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
QualityFlow (Sonnet-3.5)
94.2
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
-
o1-mini + MapCoder
93.2
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 + AgentCoder
91.8
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT4o)
90.7
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Jiutian-大模型
90.0
-
-
GPT-3.5 Turbo (ChatGPT) + AgentCoder
89.9
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
MapCoder (GPT-4o)
89.7
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 (ChatGPT Plus)
87.5
How Does Naming Affect LLMs on Code Analysis Tasks?
-
Claude 3 Opus
86.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
LPW (GPT-4o)
84.8
Planning-Driven Programming: A Large Language Model Programming Workflow
GPT-3.5 Turbo + FlowGenScrum + Test
83.8±0.6
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents
-
AFlow(GPT-4o-mini)
83.4
AFlow: Automating Agentic Workflow Generation
GPT-3.5 Turbo (ChatGPT)
83.2
How Does Naming Affect LLMs on Code Analysis Tasks?
-
MapCoder (GPT-4)
83.1
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
o1-mini + Language Agent Tree Search (Hamming.ai)
82.3
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
GPT-4 (Bing Chat)
82
How Does Naming Affect LLMs on Code Analysis Tasks?
-
GPT-3.5 Turbo + Language Agent Tree Search
81.1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
MGDebugger (CodeQwen1.5)
80.8
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Claude 3 Haiku
80.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
GPT-4 (Self-Debugging with unit tests + trace)
80.2
Teaching Large Language Models to Self-Debug
0 of 96 row(s) selected.
Previous
Next
Code Generation On Mbpp | SOTA | HyperAI超神经