XJTUMay 27, 2026arXiv:2605.27891

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

Zhida Zhang, Jie Ma, Zhan Peng, Haoxue Wu, Yang Han, Yan Han, Jun Liang, Jie Cao, Jing Li

AI Summary

SmartDirector is introduced, a video generation framework that uses multiple keyframes as conditioning to enhance narrative control and temporal pacing. It operates in two stages: Director-Gen generates a low-resolution video from keyframes, and Director-SR refines it using high-resolution keyframes. Experiments show SmartDirector significantly outperforms existing methods by enabling single-shot generation, multi-shot narrative synthesis, and video extension.

Key Contribution

Generate cinematic videos with precise narrative control by conditioning on multiple keyframes, unlocking richer storytelling than text- or single-frame-guided approaches.

Abstract

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing. In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

Related Papers