Search papers, labs, and topics across Lattice.
The paper introduces Dynamics Modelling (DyMo), a post-training method that equips LLMs with a state prediction capability to model environment dynamics alongside function calling. This allows the LLM to internally simulate the effects of its actions, mitigating the need for expensive real-world trials. Experiments on the Berkeley Function Calling Leaderboard V2 demonstrate that DyMo improves success rates, reduces hallucinations, and, when integrated with self-verification sampling (SVS), significantly enhances pass@k and enables the model to abstain from unreliable outputs.
LLMs can now plan ahead and avoid hallucinations in tool-use tasks by learning a world model, opening the door to more reliable agents.
Tool use in stateful environments presents unique challenges for large language models (LLMs), where existing test-time compute strategies relying on repeated trials in the environment are impractical. We propose dynamics modelling (DyMo), a method that augments LLMs with a state prediction capability alongside function calling during post-training. This enables LLMs to predict the future states of their actions through an internal environment model. On the Berkeley Function Calling Leaderboard V2, DyMo improves success rates and significantly reduces hallucinations. We further integrate the internal environment model into self-verification sampling (SVS), and show that this substantially improves pass^k over number of trials k, and allows the model to refuse unreliable outputs. Together, DyMo and SVS greatly enhance the effectiveness and reliability of LLMs for tool use. We believe this work charts a path towards scalable planning RL methods for LLM inference without repeatedly querying the oracle environment.