Stanford HAIIndependent ResearcherKU LeuvenJun 4, 2026arXiv:2606.06448

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe

AI Summary

This paper characterizes the system-level behavior of agent memory systems used by LLM agents in long-horizon tasks, introducing a taxonomy that classifies these systems along four axes. Through a phase-aware profiling harness, the authors analyze ten representative memory systems, revealing how design choices impact costs associated with memory construction, retrieval, and generation. The findings culminate in ten actionable recommendations that address construction scheduling, capability floors, and trade-offs between freshness and latency, providing a comprehensive framework for optimizing agent memory systems at scale.

Key Contribution

Design choices in agent memory systems can significantly shift operational costs, revealing critical trade-offs that impact long-horizon task performance.

Abstract

LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We present the first systems characterization of agent memory. First, we introduce a system-oriented taxonomy classifying agent memory systems along four axes. Second, we build a phase-aware profiling harness attributing cost to construction, retrieval, and generation. Third, we characterize ten representative systems across two benchmark suites, uncovering how design choices shift cost across the write and read paths. Finally, we derive 10 system recommendations covering construction scheduling, capability floors, amortization via query volume, freshness-latency tradeoffs, and fleet-scale management.

Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Related Papers