HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks

Hu Yaojie ; Zhou Qiang ; Chen Qihong ; Li Xiaopeng ; Liu Linbo ; Zhang Dejiao ; Kachroo Amit ; Oz Talha ; Tripp Omer

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM
  Quality Checks

Abstract

We introduce QualityFlow, a dynamic agentic workflow for program synthesis.Given the English description of a programming problem and a set of unit tests,the model's goal is to synthesize the correct program that solves the problemand passes the tests. QualityFlow includes large language model (LLM) agentsresembling a software development team, including code generation, testing, andself-debugging. We propose the LLM Quality Checker, which explicitly "imagines"whether the synthesized programs' execution would conform to the unit tests.The Quality Checks dynamically control the workflow, including actions tosubmit the final answer, clarify the problem statement, and revert previousworkflow steps. Our experiments show that the Quality Checker can preciselyaccept any correct program, mitigate faulty synthesized tests, and preventpotential workflow deviation. QualityFlow establishes the state-of-the-artresults on four program synthesis benchmarks: MBPP, HumanEval, and stricterevaluations from MBPP-EvalPlus and HumanEval-EvalPlus.

Benchmarks

BenchmarkMethodologyMetrics
code-generation-on-humanevalQualityFlow (Sonnet-3.5)
Pass@1: 98.8
code-generation-on-mbppQualityFlow (Sonnet-3.5)
Accuracy: 94.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Papers | HyperAI