TAUDec 2, 2025arXiv:2512.03013

In-Context Sync-LoRA for Portrait Video Editing

Sagi Polaczek, Or Patashnik, Ali Mahdavi-Amiri, Daniel Cohen-Or

AI Summary

The paper introduces Sync-LoRA, a method for portrait video editing that propagates edits from the first frame through the entire sequence while maintaining temporal synchronization and identity consistency using an image-to-video diffusion model. To achieve accurate synchronization, the method trains an in-context LoRA on paired videos depicting identical motion but differing in appearance, which are automatically generated and filtered based on temporal alignment. Results demonstrate that Sync-LoRA generalizes to unseen identities and diverse edits, achieving high visual fidelity and strong temporal coherence.

Key Contribution

Achieve synchronized portrait video edits with Sync-LoRA, which propagates edits from a single frame while maintaining temporal coherence and identity consistency, even generalizing to unseen identities.

Abstract

Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References61

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

In-Context Sync-LoRA for Portrait Video Editing

Related Papers