NTUToronto Metropolitan UniversityApr 16, 2026arXiv:2604.14585

Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Pei-Gen He

AI Summary

The paper investigates the effectiveness of prompt optimization in compound AI systems, finding it statistically no better than random chance in many cases. Through extensive experiments on Claude Haiku and Amazon Nova Lite, they reveal that prompt optimization primarily benefits tasks with exploitable output structures that the model can produce but doesn't by default. They further demonstrate that agent prompts don't significantly interact, and provide a diagnostic tool to predict whether prompt optimization will be worthwhile.

Key Contribution

End-to-end prompt optimization is often a waste of time and money, succeeding only when coaxing models into specific output formats they're already capable of.

Abstract

Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+6.8$ points. What distinguishes success from failure? We investigate with 18,000 grid evaluations and 144 optimization runs, testing two assumptions behind end-to-end optimization tools like TextGrad and DSPy: (A) individual prompts are worth optimizing, and (B) agent prompts interact, requiring joint optimization. Interaction effects are never significant ($p>0.52$, all $F<1.0$), and optimization helps only when the task has exploitable output structure -- a format the model can produce but does not default to. We provide a two-stage diagnostic: an \$80 ANOVA pre-test for agent coupling, and a 10-minute headroom test that predicts whether optimization is worthwhile -- turning a coin flip into an informed decision.

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

Related Papers