Code Llama - Instruct 13B (3-shot) | 49.4 | Code Llama: Open Foundation Models for Code | |
Code Llama 7B (3-shot) | 41.4 | Code Llama: Open Foundation Models for Code | |
GPT-3.5 Turbo + Language Agent Tree Search | 81.1 | Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
StarCoder 15.5B (Self-Debugging with unit tests + trace) | 53.2 | Teaching Large Language Models to Self-Debug | - |
code-cushman-001 12B (CodeT) | 55.4 | CodeT: Code Generation with Generated Tests | |
CodeGen 16B + Coder-Reviewer | 46.2 | Coder Reviewer Reranking for Code Generation | |
code-davinci-002 175B + CodeT | 67.7 | CodeT: Code Generation with Generated Tests | |
Code Llama - Instruct 7B (3-shot) | 44.4 | Code Llama: Open Foundation Models for Code | |
o1-mini + Language Agent Tree Search (Hamming.ai) | 82.3 | Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
Code Llama - Python 70B (3-shot) | 65.5 | Code Llama: Open Foundation Models for Code | |