Search papers, labs, and topics across Lattice.
This paper introduces Mobile World Models (MWM), a framework for improving action-conditioned consistency in world models for embodied navigation. MWM employs a two-stage training process involving structure pretraining and Action-Conditioned Consistency (ACC) post-training, followed by Inference-Consistent State Distillation (ICSD) to enhance rollout consistency during few-step diffusion inference. Experiments on benchmark and real-world tasks demonstrate that MWM improves visual fidelity, trajectory accuracy, planning success, and inference efficiency compared to existing approaches.
By explicitly enforcing action-conditioned consistency during training and distillation, MWM enables more reliable planning in imagined future spaces for embodied navigation.
World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.