Stanford HAIUT AustinFeb 18, 2026arXiv:2602.16229

Factored Latent Action World Models

Zizhao Wang, Chang Shi, Kevin Rohling, Amy Zhang, Peter Stone

AI Summary

This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes a scene into independent factors, each with its own latent action. FLAM addresses the limitations of monolithic inverse and forward dynamics models in complex multi-entity environments by learning independent latent actions for each factor. Experiments on both simulation and real-world datasets demonstrate that FLAM achieves superior prediction accuracy, representation quality, and facilitates downstream policy learning compared to monolithic approaches.

Key Contribution

Factored world models can disentangle the dynamics of multiple interacting entities, leading to more controllable video generation and improved policy learning.

Abstract

Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Factored Latent Action World Models

Related Papers