HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

How Does Naming Affect LLMs on Code Analysis Tasks?

Zhilong Wang Lan Zhang Chen Cao Nanqing Luo Xinzhi Luo Peng Liu

How Does Naming Affect LLMs on Code Analysis Tasks?

Abstract

The Large Language Models (LLMs), such as GPT and BERT, were proposed for natural language processing (NLP) and have shown promising results as general-purpose language models. An increasing number of industry professionals and researchers are adopting LLMs for program analysis tasks. However, one significant difference between programming languages and natural languages is that a programmer has the flexibility to assign any names to variables, methods, and functions in the program, whereas a natural language writer does not. Intuitively, the quality of naming in a program affects the performance of LLMs in program analysis tasks. This paper investigates how naming affects LLMs on code analysis tasks. Specifically, we create a set of datasets with code containing nonsense or misleading names for variables, methods, and functions, respectively. We then use well-trained models (CodeBERT) to perform code analysis tasks on these datasets. The experimental results show that naming has a significant impact on the performance of code analysis tasks based on LLMs, indicating that code representation learning based on LLMs heavily relies on well-defined names in code. Additionally, we conduct a case study on some special code analysis tasks using GPT, providing further insights.

Benchmarks

BenchmarkMethodologyMetrics
code-generation-on-mbppGPT-4 (ChatGPT Plus)
Accuracy: 87.5
code-generation-on-mbppClaude
Accuracy: 71.4
code-generation-on-mbppBard (PaLM 2/chat-bison-001)
Accuracy: 76.2
code-generation-on-mbppGPT-4 (Bing Chat)
Accuracy: 82
code-generation-on-mbppGPT-3.5 Turbo (ChatGPT)
Accuracy: 83.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
How Does Naming Affect LLMs on Code Analysis Tasks? | Papers | HyperAI