Search papers, labs, and topics across Lattice.
The paper introduces SimuRA, a novel architecture for goal-oriented agents that incorporates a world model for planning via simulation, addressing limitations of black-box autoregressive reasoning. SimuRA uses LLMs as a substrate for the world model, leveraging natural language as a discrete, hierarchical representation for planning. Experiments on complex web-browsing tasks demonstrate that SimuRA significantly improves task completion rates compared to autoregressive baselines, achieving up to 124% higher success.
LLMs gain a whopping 124% task completion boost when coupled with a world model that enables simulative reasoning, suggesting a path beyond token-by-token autoregression.
AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.