Search papers, labs, and topics across Lattice.
The paper introduces Agent Evolving Learning (AEL), a two-timescale framework designed to improve LLM agent performance in open-ended environments by enabling them to leverage past experiences. AEL uses a Thompson Sampling bandit to select memory retrieval policies at a fast timescale and employs LLM-driven reflection to diagnose failure patterns and update the agent's decision prompt at a slower timescale. Experiments on a sequential portfolio benchmark demonstrate that AEL outperforms existing self-improving methods, highlighting that effective self-diagnosis of experience usage is more critical than architectural complexity for agent improvement.
Forget complex architectures: the secret to self-improving LLM agents lies in teaching them how to *interpret* their past failures, not just remember them.
LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to remember but \emph{how to use} what has been remembered, including which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change. We introduce \emph{Agent Evolving Learning} (\ael{}), a two-timescale framework that addresses this obstacle. At the fast timescale, a Thompson Sampling bandit learns which memory retrieval policy to apply at each episode; at the slow timescale, LLM-driven reflection diagnoses failure patterns and injects causal insights into the agent's decision prompt, giving it an interpretive frame for the evidence it retrieves. On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), \ael{} achieves a Sharpe ratio of 2.13$\pm$0.47, outperforming five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches. A nine-variant ablation reveals a ``less is more''pattern: memory and reflection together produce a 58\% cumulative improvement over the stateless baseline, yet every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) \emph{degrades} performance. This demonstrates that the bottleneck in agent self-improvement is \emph{self-diagnosing how to use} experience rather than adding architectural complexity. Code and data: https://github.com/WujiangXu/AEL.