Tencent AIWeChat LabMar 3, 2026arXiv:2603.02802

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Tianlin Pan, Jiayi Dai, Chen Yuan, Zhengyao Lv, Binxin Yang, Hubery Yin, Jing Lyu, Caifeng Shan, Chenyang Si

AI Summary

NOVA is a new framework for unpaired video editing that uses a sparse control branch for semantic guidance from user-edited keyframes and a dense synthesis branch to maintain fidelity and coherence by incorporating motion and texture from the original video. To eliminate the need for paired data, they introduce a degradation-simulation training strategy where the model learns motion reconstruction and temporal consistency by training on artificially degraded videos. Experiments show NOVA outperforms existing approaches in edit fidelity, motion preservation, and temporal coherence.

Key Contribution

Achieve high-fidelity, temporally coherent video editing without paired training data by combining sparse semantic control with dense motion and texture synthesis.

Abstract

Recent video editing models have achieved impressive results, but most still require large-scale paired datasets. Collecting such naturally aligned pairs at scale remains highly challenging and constitutes a critical bottleneck, especially for local video editing data. Existing workarounds transfer image editing to video through global motion control for pair-free video editing, but such designs struggle with background and temporal consistency. In this paper, we propose NOVA: Sparse Control \&Dense Synthesis, a new framework for unpaired video editing. Specifically, the sparse branch provides semantic guidance through user-edited keyframes distributed across the video, and the dense branch continuously incorporates motion and texture information from the original video to maintain high fidelity and coherence. Moreover, we introduce a degradation-simulation training strategy that enables the model to learn motion reconstruction and temporal consistency by training on artificially degraded videos, thus eliminating the need for paired data. Our extensive experiments demonstrate that NOVA outperforms existing approaches in edit fidelity, motion preservation, and temporal coherence.

Computer Vision Data Curation & Synthetic Data

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Related Papers