Search papers, labs, and topics across Lattice.
This paper introduces Controllable Generative Video Compression (CGVC), a new paradigm for video compression that balances perceptual realism and signal fidelity by using keyframes and dense per-frame control priors to guide non-keyframe generation. A color-distance-guided keyframe selection algorithm is developed to improve color accuracy. Experiments demonstrate that CGVC outperforms previous perceptual video compression methods in both signal fidelity and perceptual quality.
Generative video compression can finally achieve both perceptual realism AND signal fidelity, thanks to a new controllable approach.
Perceptual video compression adopts generative video modeling to improve perceptual realism but frequently sacrifices signal fidelity, diverging from the goal of video compression to faithfully reproduce visual signal. To alleviate the dilemma between perception and fidelity, in this paper we propose Controllable Generative Video Compression (CGVC) paradigm to faithfully generate details guided by multiple visual conditions. Under the paradigm, representative keyframes of the scene are coded and used to provide structural priors for non-keyframe generation. Dense per-frame control prior is additionally coded to better preserve finer structure and semantics of each non-keyframe. Guided by these priors, non-keyframes are reconstructed by controllable video generation model with temporal and content consistency. Furthermore, to accurately recover color information of the video, we develop a color-distance-guided keyframe selection algorithm to adaptively choose keyframes. Experimental results show CGVC outperforms previous perceptual video compression method in terms of both signal fidelity and perceptual quality.