Search papers, labs, and topics across Lattice.
InSpatio-WorldFM is introduced as an open-source, real-time frame model for spatial intelligence that generates frames independently, unlike video-based models. It enforces multi-view spatial consistency using 3D anchors and spatial memory, preserving scene geometry and visual details. A three-stage training pipeline distills a pretrained image diffusion model into a controllable, real-time frame generator.
Ditch the video: InSpatio-WorldFM achieves real-time spatial intelligence by generating frames independently, offering a low-latency alternative to video-based world models.
We present InSpatio-WorldFM, an open-source real-time frame model for spatial intelligence. Unlike video-based world models that rely on sequential frame generation and incur substantial latency due to window-level processing, InSpatio-WorldFM adopts a frame-based paradigm that generates each frame independently, enabling low-latency real-time spatial inference. By enforcing multi-view spatial consistency through explicit 3D anchors and implicit spatial memory, the model preserves global scene geometry while maintaining fine-grained visual details across viewpoint changes. We further introduce a progressive three-stage training pipeline that transforms a pretrained image diffusion model into a controllable frame model and finally into a real-time generator through few-step distillation. Experimental results show that InSpatio-WorldFM achieves strong multi-view consistency while supporting interactive exploration on consumer-grade GPUs, providing an efficient alternative to traditional video-based world models for real-time world simulation.