Search papers, labs, and topics across Lattice.
The paper introduces Solaris, a multiplayer video world model trained on Minecraft, designed to simulate consistent multi-view observations, addressing the limitations of existing single-agent video world models. To facilitate this, the authors developed a robust data collection system for multiplayer environments, capturing synchronized videos and actions from multiple agents. They train Solaris using a staged pipeline incorporating bidirectional, causal, and Checkpointed Self Forcing training, achieving superior performance compared to existing baselines in tasks like multiplayer movement and view consistency.
Solaris lets you simulate consistent multi-view Minecraft observations, opening the door to more realistic and interactive multi-agent world models.
Existing action-conditioned video generation models (video world models) are limited to single-agent perspectives, failing to capture the multi-agent interactions of real-world environments. We introduce Solaris, a multiplayer video world model that simulates consistent multi-view observations. To enable this, we develop a multiplayer data system designed for robust, continuous, and automated data collection on video games such as Minecraft. Unlike prior platforms built for single-player settings, our system supports coordinated multi-agent interaction and synchronized videos + actions capture. Using this system, we collect 12.64 million multiplayer frames and propose an evaluation framework for multiplayer movement, memory, grounding, building, and view consistency. We train Solaris using a staged pipeline that progressively transitions from single-player to multiplayer modeling, combining bidirectional, causal, and Self Forcing training. In the final stage, we introduce Checkpointed Self Forcing, a memory-efficient Self Forcing variant that enables a longer-horizon teacher. Results show our architecture and training design outperform existing baselines. Through open-sourcing our system and models, we hope to lay the groundwork for a new generation of multi-agent world models.