UCSDFeb 17, 2026arXiv:2602.15287

Consistency-Preserving Diverse Video Generation

Xinshuang Liu, Runfa Blark Li, Truong Nguyen

AI Summary

The paper introduces a joint-sampling framework for flow-matching video generators to improve cross-video diversity in text-to-video generation while maintaining temporal consistency. It achieves this by applying diversity-driven updates and then selectively removing components that degrade a temporal-consistency objective, both computed in the latent space. Experiments demonstrate that the proposed method achieves comparable diversity to existing joint-sampling baselines, but with improved temporal consistency and color naturalness.

Key Contribution

Achieve high diversity in text-to-video generation without sacrificing temporal consistency by operating in latent space, sidestepping the usual backpropagation bottlenecks.

Abstract

Text-to-video generation is expensive, so only a few samples are typically produced per prompt. In this low-sample regime, maximizing the value of each batch requires high cross-video diversity. Recent methods improve diversity for image generation, but for videos they often degrade within-video temporal consistency and require costly backpropagation through a video decoder. We propose a joint-sampling framework for flow-matching video generators that improves batch diversity while preserving temporal consistency. Our approach applies diversity-driven updates and then removes only the components that would decrease a temporal-consistency objective. To avoid image-space gradients, we compute both objectives with lightweight latent-space models, avoiding video decoding and decoder backpropagation. Experiments on a state-of-the-art text-to-video flow-matching model show diversity comparable to strong joint-sampling baselines while substantially improving temporal consistency and color naturalness. Code will be released.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Consistency-Preserving Diverse Video Generation

Related Papers