Search papers, labs, and topics across Lattice.
This paper introduces Interactive World Simulator, a framework for building interactive world models for robotics using consistency models for both image decoding and latent-space dynamics prediction. This approach enables fast and stable simulation of physical interactions, achieving 15 FPS on a single RTX 4090 for over 10 minutes. Policies trained on data generated within the world model perform comparably to those trained on real-world data, and simulated performance correlates strongly with real-world performance.
Forget painstakingly collecting robot data in the real world – this interactive world simulator lets you train policies that perform just as well, but entirely in simulation.
Action-conditioned video prediction models (often referred to as world models) have shown strong potential for robotics applications, but existing approaches are often slow and struggle to capture physically consistent interactions over long horizons, limiting their usefulness for scalable robot policy training and evaluation. We present Interactive World Simulator, a framework for building interactive world models from a moderate-sized robot interaction dataset. Our approach leverages consistency models for both image decoding and latent-space dynamics prediction, enabling fast and stable simulation of physical interactions. In our experiments, the learned world models produce interaction-consistent pixel-level predictions and support stable long-horizon interactions for more than 10 minutes at 15 FPS on a single RTX 4090 GPU. Our framework enables scalable demonstration collection solely within the world models to train state-of-the-art imitation policies. Through extensive real-world evaluation across diverse tasks involving rigid objects, deformable objects, object piles, and their interactions, we find that policies trained on world-model-generated data perform comparably to those trained on the same amount of real-world data. Additionally, we evaluate policies both within the world models and in the real world across diverse tasks, and observe a strong correlation between simulated and real-world performance. Together, these results establish the Interactive World Simulator as a stable and physically consistent surrogate for scalable robotic data generation and faithful, reproducible policy evaluation.