Tsinghua AIAdobe ResearchOregonSCUUC DavisVirginia TechMar 4, 2026arXiv:2603.03646

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Mohamed Elmoghany, Liangbing Zhao, Xiaoqian Shen, Subhojyoti Mukherjee, Yang Zhou, Gang Wu, Viet Dac Lai, Seung-Uk Yoon, Ryan A. Rossi, Abdullah Rashwan, Puneet Mathur, V. Manjunatha, Daksh Dangi, Chien Nguyen, Nedim Lipka, T. Bui, Krishna Kumar Singh, Ruiyi Zhang, Xiaolei Huang, Jaemin Cho, Yu Wang, Namyong Park, Zhe-Yan Tu, Hongjie Chen, Hoda Eldardiry, Nesreen Ahmed, T. Nguyen, Thien Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt

AI Summary

InfinityStory is a framework for generating long-form storytelling videos with consistent visual narratives, addressing limitations in background consistency, multi-subject shot transitions, and scalability. It introduces a background-consistent generation pipeline that preserves character identity and spatial relationships across scenes. A transition-aware video synthesis module generates smooth shot transitions, supported by a new synthetic dataset of 10,000 multi-subject transition sequences.

Key Contribution

Finally, AI can generate hour-long videos with consistent characters and backgrounds, thanks to a new framework that nails seamless transitions between shots.

Abstract

Generating long-form storytelling videos with consistent visual narratives remains a significant challenge in video synthesis. We present a novel framework, dataset, and a model that address three critical limitations: background consistency across shots, seamless multi-subject shot-to-shot transitions, and scalability to hour-long narratives. Our approach introduces a background-consistent generation pipeline that maintains visual coherence across scenes while preserving character identity and spatial relationships. We further propose a transition-aware video synthesis module that generates smooth shot transitions for complex scenarios involving multiple subjects entering or exiting frames, going beyond the single-subject limitations of prior work. To support this, we contribute with a synthetic dataset of 10,000 multi-subject transition sequences covering underrepresented dynamic scene compositions. On VBench, InfinityStory achieves the highest Background Consistency (88.94), highest Subject Consistency (82.11), and the best overall average rank (2.80), showing improved stability, smoother transitions, and better temporal coherence.

Computer Vision Multimodal Models World Models & Planning

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Related Papers