HyperAI

Code Generation On Apps

Metrics

Competition Pass@1
Interview Pass@1
Introductory Pass@1

Results

Performance results of various models on this benchmark

Model Name
Competition Pass@1
Interview Pass@1
Introductory Pass@1
Paper TitleRepository
MoTCoder-7B-V1.521.1832.6354.26MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks
deepseek-ai/deepseek-coder-6.7b-instruct11.0919.7033.80DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
MoTCoder-32B-V1.527.8444.4968.44MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks
code-davinci-002 175B--31.92CodeT: Code Generation with Generated Tests
GPT-Neo 2.7B0.00%0.57%3.90%Measuring Coding Challenge Competence With APPS
CodeChain+WizardCoder-15b2.5%6.4%29.3%CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules-
CodeRL+CodeT533.313.520CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
GPT-J 6B (Finetuned)0.69%1.80%6.77%CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
AlphaCode 1B---Competition-Level Code Generation with AlphaCode
WizardCoder-15b3.757.4926.29CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules-
LPW (GPT-4o)34.865.287.2Planning-Driven Programming: A Large Language Model Programming Workflow
AlphaCode 1B Filtered from 50000---Competition-Level Code Generation with AlphaCode
CodeSim (GPT4)0.814.2126.04CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Codex 12B (Raw)0.50%1.00%5.60%Evaluating Large Language Models Trained on Code
GPT-Neo 2.7B (Finetuned)0.02%0.14%4.14%CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
code-davinci-002 175B (CodeT)6.2%14.3%47.3%CodeT: Code Generation with Generated Tests
MapCoder APPS-150-cherrypicked (GPT-4)0.00%0.70%1.30%MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT2 1.5B (Finetuned)0.00%0.57%3.90%CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
0 of 18 row(s) selected.