AccentureMar 4, 2026arXiv:2603.04257

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Zhenting Wang, Huancheng Chen, Jiayun Wang, Wei Wei

AI Summary

The paper introduces Memex, an indexed experience memory mechanism for LLM agents to overcome context window limitations in long-horizon tasks by storing full-fidelity interactions in an external database and using concise summaries with indices in the working context. MemexRL, a reinforcement learning framework, optimizes the agent's write and read behaviors for this indexed memory, learning what to summarize, archive, index, and retrieve based on reward shaping and context budget constraints. Empirical results on long-horizon tasks demonstrate that Memex agents achieve higher task success with a smaller working context compared to summary-only approaches, supported by theoretical analysis showing bounded dereferencing can preserve decision quality.

Key Contribution

LLMs can now remember the past without forgetting the details: an indexed memory system lets agents selectively retrieve full-fidelity interactions from an external database, outperforming lossy summarization methods on long-horizon tasks.

Abstract

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References47

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Related Papers