Search papers, labs, and topics across Lattice.
This paper addresses the granularity mismatch in memory mechanisms for LLM-based software engineering agents, where instance-level memory leads to misguided retrieval due to similar surface descriptions but distinct reasoning logic. They introduce Structurally Aligned Subtask-Level Memory, which aligns memory operations with the agent's functional decomposition into subtasks. Experiments on SWE-bench Verified demonstrate consistent improvements over vanilla agents and instance-level memory baselines, achieving a +4.7 pp average increase in mean Pass@1, with gains increasing with task complexity.
LLMs struggle with long-horizon reasoning in software engineering because they retrieve irrelevant memories, but aligning memory with subtasks boosts performance by 4.7 points on SWE-bench.
Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents. Recent work has further explored augmenting these agents with memory mechanisms to support long-horizon reasoning. However, these approaches typically operate at a coarse instance granularity, treating the entire problem-solving episode as the atomic unit of storage and retrieval. We empirically demonstrate that instance-level memory suffers from a fundamental granularity mismatch, resulting in misguided retrieval when tasks with similar surface descriptions require distinct reasoning logic at specific stages. To address this, we propose Structurally Aligned Subtask-Level Memory, a method that aligns memory storage, retrieval, and updating with the agent's functional decomposition. Extensive experiments on SWE-bench Verified demonstrate that our method consistently outperforms both vanilla agents and strong instance-level memory baselines across diverse backbones, improving mean Pass@1 over the vanilla agent by +4.7 pp on average (e.g., +6.8 pp on Gemini 2.5 Pro). Performance gains grow with more interaction steps, showing that leveraging past experience benefits long-horizon reasoning in complex software engineering tasks.