Search papers, labs, and topics across Lattice.
This paper analyzes the expectation-realization gap in agentic AI systems across software engineering, clinical documentation, and clinical decision support. It finds significant discrepancies between expected and actual performance, with developers overestimating speedups and vendors overstating time savings. The analysis identifies workflow integration friction, verification burden, and measurement mismatches as key drivers of these shortfalls.
Agentic AI systems are falling far short of expectations, with real-world deployments revealing productivity losses and inflated claims that should give researchers and practitioners pause.
Agentic AI systems are deployed with expectations of substantial productivity gains, yet rigorous empirical evidence reveals systematic discrepancies between pre-deployment expectations and post-deployment outcomes. We review controlled trials and independent validations across software engineering, clinical documentation, and clinical decision support to quantify this expectation-realisation gap. In software development, experienced developers expected a 24% speedup from AI tools but were slowed by 19% -- a 43 percentage-point calibration error. In clinical documentation, vendor claims of multi-minute time savings contrast with measured reductions of less than one minute per note, and one widely deployed tool showed no statistically significant effect. In clinical decision support, externally validated performance falls substantially below developer-reported metrics. These shortfalls are driven by workflow integration friction, verification burden, measurement construct mismatches, and systematic heterogeneity in treatment effects. The evidence motivates structured planning frameworks that require explicit, quantified benefit expectations with human oversight costs factored in.