Search papers, labs, and topics across Lattice.
The paper introduces PropFly, a training pipeline for propagation-based video editing that circumvents the need for paired video datasets by using on-the-fly supervision from pre-trained video diffusion models (VDMs). PropFly synthesizes diverse 'source' and 'edited' latent pairs by leveraging one-step clean latent estimations from intermediate noised latents with varying Classifier-Free Guidance (CFG) scales. The method trains an adapter attached to the pre-trained VDM to propagate edits via a Guidance-Modulated Flow Matching (GMFM) loss, achieving state-of-the-art performance on various video editing tasks.
Skip the expensive paired video data: PropFly trains video editing propagation models by generating supervision signals directly from pre-trained video diffusion models.
Propagation-based video editing enables precise user control by propagating a single edited frame into following frames while maintaining the original context such as motion and structures. However, training such models requires large-scale, paired (source and edited) video datasets, which are costly and complex to acquire. Hence, we propose the PropFly, a training pipeline for Propagation-based video editing, relying on on-the-Fly supervision from pre-trained video diffusion models (VDMs) instead of requiring off-the-shelf or precomputed paired video editing datasets. Specifically, our PropFly leverages one-step clean latent estimations from intermediate noised latents with varying Classifier-Free Guidance (CFG) scales to synthesize diverse pairs of 'source' (low-CFG) and 'edited' (high-CFG) latents on-the-fly. The source latent serves as structural information of the video, while the edited latent provides the target transformation for learning propagation. Our pipeline enables an additional adapter attached to the pre-trained VDM to learn to propagate edits via Guidance-Modulated Flow Matching (GMFM) loss, which guides the model to replicate the target transformation. Our on-the-fly supervision ensures the model to learn temporally consistent and dynamic transformations. Extensive experiments demonstrate that our PropFly significantly outperforms the state-of-the-art methods on various video editing tasks, producing high-quality editing results.