ZJUMay 29, 2026arXiv:2605.31158

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Jiacheng Lu, Haoyi Zhu, Sipei Yi, Enze Xie, Yu Li, Cheng Zhuo

AI Summary

This paper introduces Light Interaction, a training-free framework to accelerate inference for interactive video world models by exploiting trajectory-dependent adaptive computation. It achieves this by adaptively managing context, caching denoising steps, and using hardware-software co-designed 3D block sparse attention. Experiments on HY-WorldPlay and Matrix-Game-3.0 show up to 2.59x speedup without retraining while preserving visual quality.

Key Contribution

Interactive video world models can be sped up by 2.5x without retraining, simply by being smarter about how they use context and computation based on the user's actions.

Abstract

Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Related Papers