HKUTsukubaUniversity of North TexasYonseiApr 29, 2026arXiv:2604.26622

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Jinze Li, Yang Zhang, Xin Yang, Jiayi Qu, Jinfeng Xu, Shuo Yang, Junhua Ding, Edith Cheuk-Han Ngai

AI Summary

OCR-Memory is introduced as a novel memory framework for LLM agents operating in long-horizon environments, addressing the limitations of text-based memory systems by leveraging visual representations of agent experience. The system renders trajectories into images annotated with visual identifiers, enabling efficient storage and retrieval of long histories within strict context limits. Experiments on long-horizon benchmarks demonstrate that OCR-Memory increases effective memory capacity and preserves faithful evidence recovery compared to text-based alternatives.

Key Contribution

LLM agents can now remember far more, far more accurately, by "seeing" their past experiences instead of just reading about them.

Abstract

Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-context budgets: storing or revisiting raw trajectories is prohibitively token-expensive, while summarization and text-only retrieval trade token savings for information loss and fragmented evidence. To address this limitation, we propose Optical Context Retrieval Memory (OCR-Memory), a memory framework that leverages the visual modality as a high-density representation of agent experience, enabling retention of arbitrarily long histories with minimal prompt overhead at retrieval time. Specifically, OCR-Memory renders historical trajectories into images annotated with unique visual identifiers. OCR-Memory retrieves stored experience via a \emph{locate-and-transcribe} paradigm that selects relevant regions through visual anchors and retrieves the corresponding verbatim text, avoiding free-form generation and reducing hallucination. Experiments on long-horizon agent benchmarks show consistent gains under strict context limits, demonstrating that optical encoding increases effective memory capacity while preserving faithful evidence recovery.

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Related Papers