HKUHuaweiJun 9, 2026arXiv:2606.10616

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Qingcan Kang, Liu Mingyang, Shixiong Kai, Kaichao Liang, Mingxuan Yuan

AI Summary

This paper addresses the challenge of memory retention in long-horizon language agents by framing it as a constrained stochastic optimization problem that considers long-term consequences and observability constraints. The proposed OSL-MR framework separates online-observable features from offline supervision, enabling agents to learn the value of evidence from interaction data while adhering to memory budget limitations. Experimental results on LOCOMO and LongMemEval indicate that OSL-MR significantly outperforms traditional recency-based and heuristic methods, especially in scenarios with strict memory constraints.

Key Contribution

Memory retention strategies that account for long-term consequences can dramatically enhance the performance of language agents under tight constraints.

Abstract

Long-horizon language agents accumulate observations, reasoning traces, and retrieved facts that exceed their finite context windows, making memory retention a fundamental resource-allocation problem. Existing memory systems improve management through heuristic scoring, retrieval optimization, or learned compression, but largely treat retention as a local decision problem and do not explicitly model its long-term consequences under realistic observability constraints. To fill this gap, we formulate memory retention as a constrained stochastic optimization problem with explicit budget feasibility, evidence utility, and delayed costs including miss penalties, reacquisition delays, and stale-information risk. We then propose OSL-MR (Observability-Safe Learning for Memory Retention), a novel framework that enforces a strict separation between online-observable features and offline-available supervision (OAS). OSL-MR combines an evidence learner trained from realized evidence supervision with a Mixed-Score heuristic that serves both as a deployable online-safe baseline and as a structured inductive prior for learning. The resulting policy learns query-conditioned evidence value directly from interaction data while remaining deployable under the same observability constraints. Experiments on LOCOMO and LongMemEval show that OSL-MR consistently outperforms recency-based methods, Generative Agents-style scoring, and other heuristic baselines, particularly under tight memory budgets. The Mixed-Score prior further improves precision while preserving recall, and sensitivity analysis demonstrates robustness across a wide range of cost configurations.

Reasoning & Chain-of-Thought Scalable Oversight & Alignment Theory

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Related Papers