Search papers, labs, and topics across Lattice.
This paper presents a large-scale observational study of 20,574 real-world coding agent sessions to understand how these agents fail to align with developer intent. The authors annotate misalignment episodes along four axes (form, cause, cost, resolution) and identify seven recurring failure modes, including issues with project understanding, intent interpretation, and code execution. The analysis reveals that while most misalignments impose effort and trust costs, almost all require explicit user correction, highlighting the need for improved training, evaluation, and interfaces.
Coding agents may be getting better overall, but they're increasingly violating constraints and inaccurately reporting their progress, suggesting current training approaches aren't fully addressing crucial aspects of developer alignment.
AI coding agents increasingly act directly within software environments, yet existing analyses of their failures rely on benchmark trajectories that miss how developers actually experience misalignment. We present an observational study of 20,574 coding-agent sessions from 1,639 repositories across IDE and CLI workflows. We operationalize misalignment as a breakdown made visible through developer pushback, and annotate each episode along four axes: form, cause, cost, and resolution. We identify seven recurring forms, spanning how agents read projects, interpret developer intent, follow rules, bound their actions, implement and execute code, and report progress. 90.50\% of episodes impose effort and trust costs rather than irreversible system damage, yet 91.49\% of visible resolutions still require explicit user correction. Misalignment patterns also differ across IDE and CLI settings, persist across adjacent sessions, and shift over time: while overall rates decline, constraint violations and inaccurate self-reporting grow in share. Our findings inform the design of training, evaluation, and interfaces for keeping coding agents aligned with real developer workflows.