8 months ago

Abstract

Recent advancements in Large Language Models (LLMs) and their utilization incode generation tasks have significantly reshaped the field of softwaredevelopment. Despite the remarkable efficacy of code completion solutions inmainstream programming languages, their performance lags when applied to lessubiquitous formats such as OpenAPI definitions. This study evaluates theOpenAPI completion performance of GitHub Copilot, a prevalent commercial codecompletion tool, and proposes a set of task-specific optimizations leveragingMeta's open-source model Code Llama. A semantics-aware OpenAPI completionbenchmark proposed in this research is used to perform a series of experimentsthrough which the impact of various prompt-engineering and fine-tuningtechniques on the Code Llama model's performance is analyzed. The fine-tunedCode Llama model reaches a peak correctness improvement of 55.2% over GitHubCopilot despite utilizing 25 times fewer parameters than the commercialsolution's underlying Codex model. Additionally, this research proposes anenhancement to a widely used code infilling training technique, addressing theissue of underperformance when the model is prompted with context sizes smallerthan those used during training. The dataset, the benchmark, and the modelfine-tuning code are made publicly available.

Source PDF