Notre DameJun 11, 2026arXiv:2606.12780

ProPlay: Procedural World Models for Self-Evolving LLM Agents

Yijun Ma, Zehong Wang, Yiyang Li, Ziming Li, Xiaoguang Guo, Weixiang Sun, Chuxu Zhang, Yanfang Ye

AI Summary

This paper introduces ProPlay, a procedural world model designed for self-evolving agents that operate in partially observable environments. By abstracting successful trajectories into a procedure graph and employing structured soft guidance for future simulations, ProPlay enables agents to refine their understanding of environment dynamics through iterative feedback. Experimental results demonstrate that ProPlay significantly enhances both environment comprehension and self-evolution capabilities compared to existing methods.

Key Contribution

ProPlay allows agents to rehearse future actions using a structured procedure graph, leading to substantial improvements in self-evolution and environment understanding.

Abstract

Self-evolving agents are expected to improve through interaction without external supervision, but this remains difficult in partially observable environments where agents must explore actively, learn from limited feedback, and decide when to trust prior experience. Existing LLM-agent methods often rely on memory or planning modules, yet they rarely close the loop between them to continually refine an internal understanding of environment dynamics. We introduce ProPlay, a procedural world model that supports procedure-level preplay, where agents can rehearse future procedural paths using the learned world knowledge. Rather than representing experience as isolated rules or low-level action constraints, ProPlay abstracts successful trajectories into procedures and organizes them in a procedure graph that captures causal transitions among task stages. Each transition is associated with a reliability record embedding to estimate its task-specific contribution from past outcomes. Before each episode, ProPlay simulates future procedural trajectories over known graph structures as structured soft guidance; after execution, it refines the graph using environment feedback. Experiments on public benchmarks show that ProPlay consistently improves environment understanding and self-evolution capability over strong baselines. Our code has been released in https://github.com/antman9914/proplay.

Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References47

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ProPlay: Procedural World Models for Self-Evolving LLM Agents

Related Papers