Multi-agent Workflow CudaForge
CudaForge was proposed by a research team at the University of Minnesota in October 2025, and the relevant research results were published in a paper. CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization .
CudaForge is a trainingless multi-agent workflow for CUDA kernel generation and optimization, inspired by the iterative workflow of human experts. It includes steps such as developing an initial kernel, testing for correctness, analyzing hardware feedback, and iterative improvement. More specifically, CudaForge employs two LLM agents: a Coder and a Judge, which iteratively generate, correct, and optimize CUDA kernels while integrating hardware feedback, such as Nsight Compute (NCU) metrics.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.