Search papers, labs, and topics across Lattice.
DynaEdit, a novel training-free video editing method, leverages pretrained text-to-video flow models to enable versatile editing capabilities, including modifying actions, inserting interacting objects, and introducing global effects. The method addresses challenges in adapting inversion-free approaches to unconstrained editing, specifically low-frequency misalignment and high-frequency jitter, by introducing novel mechanisms to overcome these issues. Experiments demonstrate that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks compared to existing methods.
Forget finetuning: DynaEdit unlocks complex video edits like action modification and object insertion, all without training, using clever manipulation of pretrained text-to-video models.
Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in real-world videos, remains a major challenge. Existing trained models struggle with complex edits, likely due to the difficulty of collecting relevant training data. Similarly, existing training-free methods are inherently restricted to structure- and motion-preserving edits and do not support modification of motion or interactions. Here, we introduce DynaEdit, a training-free editing method that unlocks versatile video editing capabilities with pretrained text-to-video flow models. Our method relies on the recently introduced inversion-free approach, which does not intervene in the model internals, and is thus model-agnostic. We show that naively attempting to adapt this approach to general unconstrained editing results in severe low-frequency misalignment and high-frequency jitter. We explain the sources for these phenomena and introduce novel mechanisms for overcoming them. Through extensive experiments, we show that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks, including modifying actions, inserting objects that interact with the scene, and introducing global effects.