Feb 2, 2026arXiv:2602.01655

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Pengrui Lu, Shiqi Zhang, Yunzhong Hou, Lyumanshan Ye, Chaoyi Huang, Zixi Chen, Ji Zeng, Hantao Jiang, Pengfei Liu, Yiwei Wang, Mingchao Yang

AI Summary

The paper introduces ProjDevBench, a new benchmark for evaluating AI coding agents on end-to-end project development, addressing the limitations of existing issue-level evaluations. ProjDevBench assesses agents on system architecture design, functional correctness, and iterative refinement using a combination of online judge testing and LLM-assisted code review across 20 programming problems. Experiments with six coding agents reveal an overall acceptance rate of 27.38%, highlighting challenges in complex system design, time complexity optimization, and resource management.

Key Contribution

Coding agents can generate codebases, but they still fail at complex system design, time complexity optimization, and resource management in end-to-end project development.

Abstract

Recent coding agents can generate complete codebases from simple prompts, yet existing evaluations focus on issue-level bug fixing and lag behind end-to-end development. We introduce ProjDevBench, an end-to-end benchmark that provides project requirements to coding agents and evaluates the resulting repositories. Combining Online Judge (OJ) testing with LLM-assisted code review, the benchmark evaluates agents on (1) system architecture design, (2) functional correctness, and (3) iterative solution refinement. We curate 20 programming problems across 8 categories, covering both concept-oriented tasks and real-world application scenarios, and evaluate six coding agents built on different LLM backends. Our evaluation reports an overall acceptance rate of 27.38%: agents handle basic functionality and data structures but struggle with complex system design, time complexity optimization, and resource management. Our benchmark is available at https://github.com/zsworld6/projdevbench.

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Related Papers