Search papers, labs, and topics across Lattice.
This study introduces Ada, a novel apparatus designed for repository-level code understanding that enables software engineering agents (SWE agents) to explore real codebases while generating recordable trajectories of their behavior. By employing disciplined observation lenses, Ada reveals the decision-making processes behind evidence selection, navigation, and self-directed stopping, transforming raw trajectory data into interpretable behavioral profiles. The analysis of 408 trajectories across various models and tasks demonstrates significant differences in efficiency and epistemic grounding, providing a robust framework for understanding SWE agent behavior in practical settings.
Ada reveals the intricate decision-making processes of software engineering agents, transforming raw trajectory data into actionable insights about their behavior.
Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to characterize in concrete, observable terms. These trajectories record tool use, intermediate reasoning, evidence selection, and self-directed stopping, but they do not by themselves explain why particular moves were chosen, what evidence was trusted, or when understanding was judged sufficient. This tension makes trajectory data both limited and valuable: faithful, replayable traces can become an empirical substrate for studying agent behavior when interpreted through disciplined observation. We introduce Ada, a scoped apparatus for repository-level code understanding. Ada enters real codebases through a bounded tool interface, allowing open-ended exploration to remain recordable as finite trajectories. Across this wild-but-bounded setting, Ada chooses where to look, what to read closely, when to consolidate partial understanding, and when to close its account of the repository. We project Ada's think-action chains through observation lenses that make navigation, evidence selection, synthesis, grounding, and stopping visible without reducing behavior to raw tool counts or speculating about hidden intent. Read together, these lenses produce behavioral profiles grounded in recorded movement through software worlds. Across 408 trajectories, spanning multiple models, repositories, task families, and launch conditions, the study shows how faithful digital traces can be transformed into disciplined, comparable projections of emerging SWE-agent mindset. The results expose differences in efficiency, trajectory diversity, epistemic grounding, and the limits of intervention, while providing a methodological foundation for observing SWE agent behavior in real codebases.