Notre DameVanderbiltMay 28, 2026arXiv:2605.29442

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi, Yu Huang, Collin McMillan, Tao Dong, T. Li

AI Summary

This paper presents a large-scale observational study of 20,574 real-world coding agent sessions to understand how these agents fail to align with developer intent. The authors annotate misalignment episodes along four axes (form, cause, cost, resolution) and identify seven recurring failure modes, including issues with project understanding, intent interpretation, and code execution. The analysis reveals that while most misalignments impose effort and trust costs, almost all require explicit user correction, highlighting the need for improved training, evaluation, and interfaces.

Key Contribution

Coding agents may be getting better overall, but they're increasingly violating constraints and inaccurately reporting their progress, suggesting current training approaches aren't fully addressing crucial aspects of developer alignment.

Abstract

AI coding agents increasingly act directly within software environments, yet existing analyses of their failures rely on benchmark trajectories that miss how developers actually experience misalignment. We present an observational study of 20,574 coding-agent sessions from 1,639 repositories across IDE and CLI workflows. We operationalize misalignment as a breakdown made visible through developer pushback, and annotate each episode along four axes: form, cause, cost, and resolution. We identify seven recurring forms, spanning how agents read projects, interpret developer intent, follow rules, bound their actions, implement and execute code, and report progress. 90.50\% of episodes impose effort and trust costs rather than irreversible system damage, yet 91.49\% of visible resolutions still require explicit user correction. Misalignment patterns also differ across IDE and CLI settings, persist across adjacent sessions, and shift over time: while overall rates decline, constraint violations and inaccurate self-reporting grow in share. Our findings inform the design of training, evaluation, and interfaces for keeping coding agents aligned with real developer workflows.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

Related Papers