Search papers, labs, and topics across Lattice.
This paper introduces a reinforcement learning framework with a ContextCurator policy model that actively manages the context window of a frozen LLM TaskExecutor to mitigate the context bottleneck in long-horizon tasks. The ContextCurator is trained to prune noisy information while preserving critical reasoning anchors, improving performance and reducing token consumption. Experiments on WebArena and DeepSearch show that this approach enhances success rates and achieves comparable or superior context management to much larger models like GPT-4o with significantly reduced computational cost.
A lightweight, RL-trained context curator can match GPT-4o's context management abilities, slashing token consumption by 8x and opening the door to efficient long-horizon LLM agents.
Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.