Search papers, labs, and topics across Lattice.
OCR-Memory is introduced as a novel memory framework for LLM agents operating in long-horizon environments, addressing the limitations of text-based memory systems by leveraging visual representations of agent experience. The system renders trajectories into images annotated with visual identifiers, enabling efficient storage and retrieval of long histories within strict context limits. Experiments on long-horizon benchmarks demonstrate that OCR-Memory increases effective memory capacity and preserves faithful evidence recovery compared to text-based alternatives.
LLM agents can now remember far more, far more accurately, by "seeing" their past experiences instead of just reading about them.
Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-context budgets: storing or revisiting raw trajectories is prohibitively token-expensive, while summarization and text-only retrieval trade token savings for information loss and fragmented evidence. To address this limitation, we propose Optical Context Retrieval Memory (OCR-Memory), a memory framework that leverages the visual modality as a high-density representation of agent experience, enabling retention of arbitrarily long histories with minimal prompt overhead at retrieval time. Specifically, OCR-Memory renders historical trajectories into images annotated with unique visual identifiers. OCR-Memory retrieves stored experience via a \emph{locate-and-transcribe} paradigm that selects relevant regions through visual anchors and retrieves the corresponding verbatim text, avoiding free-form generation and reducing hallucination. Experiments on long-horizon agent benchmarks show consistent gains under strict context limits, demonstrating that optical encoding increases effective memory capacity while preserving faithful evidence recovery.