NVIDIAGuangdong Province Key Laboratory of Molecular Tumor PathologyHZNUJinfeng LaboratoryOct 17, 2025

Blueprint2Code: a multi-agent pipeline for reliable code generation via blueprint planning and repair

Kehao Mao, Baokun Hu, Ruixin Lin, Zewen Li, Guanyu Lu, Zhengyu Zhang

AI Summary

The paper introduces Blueprint2Code, a multi-agent framework designed to improve code generation by mimicking the human programming workflow through task comprehension, planning, implementation, and iterative refinement. This framework utilizes four interacting agents—Previewing, Blueprint, Coding, and Debugging—to address the limitations of LLMs in complex programming tasks requiring multi-step reasoning and reliable code generation. Experiments on HumanEval, MBPP, and APPS datasets demonstrate that Blueprint2Code achieves state-of-the-art pass@1 results, significantly outperforming existing methods, especially on extended and more challenging versions of the benchmarks.

Key Contribution

A multi-agent system that decomposes code generation into planning, coding, and debugging achieves near-perfect pass@1 scores on HumanEval, suggesting a promising path toward reliable automated programming.

Abstract

Automated programming has become a powerful tool for solving real-world problems. Code generation, in particular, plays a key role in improving developer productivity and reducing the entry barrier to software development. Recent advances in large language models (LLMs) have significantly improved program synthesis, enabling high-quality code generation from natural language. However, LLMs still struggle with complex tasks, especially in understanding problem intent, conducting multi-step reasoning, and producing code that passes all test cases. As task difficulty increases, existing models often fail to devise complete and reliable generation strategies, leading to reduced accuracy and robustness. To address these limitations, we propose Blueprint2Code, an innovative multi-agent framework for code generation. It emulates the human programming workflow through the coordinated interaction of four agents—Previewing, Blueprint, Coding, and Debugging—forming a closed-loop system from task comprehension to planning, implementation, and iterative refinement. Compared to existing methods, Blueprint2Code shows superior performance on complex programming tasks. Extensive experiments on benchmark datasets—HumanEval, MBPP, their extended versions (HumanEval-ET, MBPP-ET), and the APPS competition dataset—demonstrated its effectiveness, achieving strong pass@1 results: HumanEval 96.3%, MBPP 88.4%, HumanEval-ET 86.5%, MBPP-ET 59.4%, and APPS 24.6%. The related code is available at https://github.com/MKH99918/Blueprint2Code.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations1

Influential citations0

References28

Year2025

VenueFrontiers Artif. Intell.

Related Papers

Finding related papers...

Search

Blueprint2Code: a multi-agent pipeline for reliable code generation via blueprint planning and repair

Related Papers