Command Palette
Search for a command to run...
Petryshyn Bohdan ; Lukoševičius Mantas

Abstract
Recent advancements in Large Language Models (LLMs) and their utilization incode generation tasks have significantly reshaped the field of softwaredevelopment. Despite the remarkable efficacy of code completion solutions inmainstream programming languages, their performance lags when applied to lessubiquitous formats such as OpenAPI definitions. This study evaluates theOpenAPI completion performance of GitHub Copilot, a prevalent commercial codecompletion tool, and proposes a set of task-specific optimizations leveragingMeta's open-source model Code Llama. A semantics-aware OpenAPI completionbenchmark proposed in this research is used to perform a series of experimentsthrough which the impact of various prompt-engineering and fine-tuningtechniques on the Code Llama model's performance is analyzed. The fine-tunedCode Llama model reaches a peak correctness improvement of 55.2% over GitHubCopilot despite utilizing 25 times fewer parameters than the commercialsolution's underlying Codex model. Additionally, this research proposes anenhancement to a widely used code infilling training technique, addressing theissue of underperformance when the model is prompted with context sizes smallerthan those used during training. The dataset, the benchmark, and the modelfine-tuning code are made publicly available.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| openapi-code-completion-on-openapi-code | Code Llama 7B | Correctness, avg., %: 31.1 Correctness, max., %: 36 Validness, avg., %: 60.7 Validness, max., %: 64 |
| openapi-code-completion-on-openapi-code | Code Llama 7B, fine-tuned with document splitting | Correctness, avg., %: 34 Correctness, max., %: 42 Validness, avg., %: 69.1 Validness, max., %: 76 |
| openapi-code-completion-on-openapi-code | GitHub Copilot | Correctness, avg., %: 29 Correctness, max., %: 29 Validness, avg., %: 68 Validness, max., %: 68 |
| openapi-code-completion-on-openapi-code | Code Llama 7B, fine-tuned at 4096 tokens | Correctness, avg., %: 32 Correctness, max., %: 45 Validness, avg., %: 63.1 Validness, max., %: 84 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.