Search papers, labs, and topics across Lattice.
This paper investigates the limitations of improving repository-level automated program repair (APR) solely through enhanced localization, using SWE-bench Lite and three RAG-APR systems: Agentless, KGCompass, and ExpeRepair. By employing Oracle Localization and controlled experiments with context probes and negative examples, the study quantifies the recoverable gains from post-localization strategies like candidate diversity and context augmentation. The results reveal that even with perfect localization, success rates remain below 50%, and significant gains can still be achieved by optimizing evidence quality and interface design, highlighting a substantial residual frontier beyond localization.
Even with perfect bug localization, repository-level program repair fails more than half the time, revealing that better context and interface design are the next big levers to pull.
Repository-level automated program repair (APR) increasingly treats stronger localization as the main path to better repair. We ask a more targeted question: once localization is strengthened, which post-localization levers still provide recoverable gains, which are bounded within our protocol, and what residual frontier remains? We study this question on SWE-bench Lite with three representative repository-level RAG-APR paradigms, Agentless, KGCompass, and ExpeRepair. Our protocol combines Oracle Localization, within-pool Best-of-K, fixed-interface added context probes with per-condition same-token filler controls and same-repository hard negatives, and a common-wrapper oracle check. Oracle Localization improves all three systems, but Oracle success still stays below 50%. Extra candidate diversity still helps inside the sampled 10-patch pools, but that headroom saturates quickly. Under the two fixed interfaces, most informative added context conditions still outperform their own matched controls. The common-wrapper check shows different system responses: under a common wrapper, gains remain large for KGCompass and ExpeRepair, while Agentless changes more with builder choice. Prompt-level fusion still leaves a large residual frontier: the best fixed probe adds only 6 solved instances beyond the native three-system Solved@10 union. Overall, stronger localization, bounded search, evidence quality, and interface design all shape repository-level repair outcomes.