Microsoft ResearchDUTMBZUAIPKUApr 15, 2026arXiv:2604.13824

Beyond State Consistency: Behavior Consistency in Text-Based World Models

Youling Huang, Guanqiao Chen, Junchi Yao, Lu Wang, Fangkai Yang, Chao Du, ChenZhuo Zhao, Pu Zhao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

AI Summary

This paper introduces a behavior-aligned training paradigm for text-based world models, moving beyond state consistency to focus on functional consistency with the real environment. They optimize a step-level metric called Behavior Consistency Reward (BehR), which measures the change in likelihood of a logged next action between the real state and the world-model-predicted state under a frozen Reference Agent. Experiments on WebShop and TextWorld demonstrate that BehR-based training improves long-term alignment and reduces false positives in offline surrogate evaluation, while maintaining or improving single-step prediction quality.

Key Contribution

Stop obsessing over state prediction accuracy in text-based world models: aligning them with *behavior* yields better long-term planning and evaluation.

Abstract

World models have been emerging as critical components for assessing the consequences of actions generated by interactive agents in online planning and offline evaluation. In text-based environments, world models are typically evaluated and trained with single-step metrics such as Exact Match, aiming to improve the similarity between predicted and real-world states, but such metrics have been shown to be insufficient for capturing actual agent behavior. To address this issue, we introduce a new behavior-aligned training paradigm aimed at improving the functional consistency between the world model and the real environment. This paradigm focuses on optimizing a tractable step-level metric named Behavior Consistency Reward (BehR), which measures how much the likelihood of a logged next action changes between the real state and the world-model-predicted state under a frozen Reference Agent. Experiments on WebShop and TextWorld show that BehR-based training improves long-term alignment in several settings, with the clearest gains in WebShop and less movement in near-ceiling regimes, while preserving or improving single-step prediction quality in three of four settings. World models trained with BehR also achieve lower false positives in offline surrogate evaluation and show modest but encouraging gains in inference-time lookahead planning.

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond State Consistency: Behavior Consistency in Text-Based World Models

Related Papers