Microsoft ResearchGeorgia TechApr 9, 2026arXiv:2604.07789

ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

Kenan Li, Qirui Jin, Liao Zhu, Xiaosong Huang, Yijia Wu, Yijian Wu, Yikai Zhang, Xin Zhang, Zijian Jin, Yufan Huang, Elsie Nallipogu, Chao Zhang, Chaoyun Zhang, Yu Kang, Saravan Rajmohan, S. Rajmohan, Qingwei Lin, Wenke Lee, Dongmei Zhang

AI Summary

This paper introduces Oracle-SWE, a methodology to isolate and quantify the individual contributions of different information signals (Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage) on software engineering agent performance. By extracting "oracle" versions of these signals from SWE benchmarks, the authors determine the upper-bound performance gain achievable with perfect intermediate information. Experiments reveal the relative importance of each signal, providing insights into which areas offer the most potential for improving autonomous coding systems.

Key Contribution

Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.

Abstract

Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References50

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

Related Papers