Mar 17, 2026arXiv:2603.16864

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Jiongze Yu, Jiongze Yu, Xiangbo Gao, Pooja Verlani, Pooja Verlani, Akshay Gadde, Akshay Gadde, Yili Wang, Balu Adsumilli, Balu Adsumilli, Zhengzhong Tu

AI Summary

SparkVSR introduces an interactive video super-resolution (VSR) framework that leverages sparse keyframes as a control signal, allowing users to guide the super-resolution process. The method uses a keyframe-conditioned latent-pixel two-stage training pipeline to fuse low-resolution video latents with sparsely encoded high-resolution keyframe latents, enabling robust cross-space propagation and perceptual detail refinement. Experiments show SparkVSR achieves state-of-the-art performance on VSR benchmarks, improving temporal consistency and restoration quality by up to 24.6% on CLIP-IQA, and also demonstrates generalization to tasks like old-film restoration and video style transfer.

Key Contribution

Control video super-resolution with a few keyframes: SparkVSR lets you guide the process and fix artifacts, unlike black-box VSR models.

Abstract

Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a small set of keyframes using any off-the-shelf image super-resolution (ISR) model, then SparkVSR propagates the keyframe priors to the entire video sequence while remaining grounded by the original LR video motion. Concretely, we introduce a keyframe-conditioned latent-pixel two-stage training pipeline that fuses LR video latents with sparsely encoded HR keyframe latents to learn robust cross-space propagation and refine perceptual details. At inference time, SparkVSR supports flexible keyframe selection (manual specification, codec I-frame extraction, or random sampling) and a reference-free guidance mechanism that continuously balances keyframe adherence and blind restoration, ensuring robust performance even when reference keyframes are absent or imperfect. Experiments on multiple VSR benchmarks demonstrate improved temporal consistency and strong restoration quality, surpassing baselines by up to 24.6%, 21.8%, and 5.6% on CLIP-IQA, DOVER, and MUSIQ, respectively, enabling controllable, keyframe-driven video super-resolution. Moreover, we demonstrate that SparkVSR is a generic interactive, keyframe-conditioned video processing framework as it can be applied out of the box to unseen tasks such as old-film restoration and video style transfer. Our project page is available at: https://sparkvsr.github.io/

Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Related Papers