Adobe ResearchHKUUCIUCSDUPennMar 31, 2026arXiv:2603.30045

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

Yuheng Liu, Xin Lin, Xinke Li, Baihan Yang, Chen Wang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Hao Tan, Kai Zhang, Xiaohui Xie, Zifan Shi, Yiwei Hu

AI Summary

OmniRoam is introduced, a framework for generating controllable, long-horizon panoramic videos for scene exploration. It uses a two-stage approach: a trajectory-controlled video generation model for a quick scene overview, followed by temporal extension and spatial upsampling for high-resolution, long-range video generation. The model is trained on two newly introduced panoramic video datasets (synthetic and real-world), and demonstrates superior visual quality, controllability, and long-term consistency compared to existing methods.

Key Contribution

Finally, a video generation model lets you roam through a scene with long-term spatial and temporal consistency, opening up new possibilities for virtual exploration.

Abstract

Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction. Code is available at https://github.com/yuhengliu02/OmniRoam.

Computer Vision Multimodal Models World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

Related Papers