HyperAI超神经

Code Generation On Mbpp

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
Accuracy
Paper TitleRepository
LLaMA 33B (0-shot)30.2LLaMA: Open and Efficient Foundation Language Models
Code Llama - Instruct 13B (3-shot)49.4Code Llama: Open Foundation Models for Code
Code Llama 7B (3-shot)41.4Code Llama: Open Foundation Models for Code
GPT-3.5 Turbo + Language Agent Tree Search81.1Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
StarCoder 15.5B (Self-Debugging with unit tests + trace)53.2Teaching Large Language Models to Self-Debug-
code-cushman-001 12B (CodeT)55.4CodeT: Code Generation with Generated Tests
LPW (GPT-4o)84.8Planning-Driven Programming: A Large Language Model Programming Workflow
CodeGen 16B + Coder-Reviewer46.2Coder Reviewer Reranking for Code Generation
Llama 2 34B (0-shot)33Llama 2: Open Foundation and Fine-Tuned Chat Models
GPT-3.5 Turbo (0-shot)39.8INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
MapCoder (GPT-4)83.1MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 (ChatGPT Plus)87.5How Does Naming Affect LLMs on Code Analysis Tasks?-
code-davinci-002 175B + CodeT67.7CodeT: Code Generation with Generated Tests
GPT-3.5 Turbo (ChatGPT) + AgentCoder89.9AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Claude71.4How Does Naming Affect LLMs on Code Analysis Tasks?-
Code Llama - Instruct 7B (3-shot)44.4Code Llama: Open Foundation Models for Code
o1-mini + Language Agent Tree Search (Hamming.ai)82.3Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
PaLM Coder 540B47PaLM: Scaling Language Modeling with Pathways
Llama 2 7B (0-shot)20.8Llama 2: Open Foundation and Fine-Tuned Chat Models
Code Llama - Python 70B (3-shot)65.5Code Llama: Open Foundation Models for Code
0 of 96 row(s) selected.