Search papers, labs, and topics across Lattice.
This paper introduces Oracle-SWE, a methodology to isolate and quantify the individual contributions of different information signals (Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage) on software engineering agent performance. By extracting "oracle" versions of these signals from SWE benchmarks, the authors determine the upper-bound performance gain achievable with perfect intermediate information. Experiments reveal the relative importance of each signal, providing insights into which areas offer the most potential for improving autonomous coding systems.
Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.
Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.