Search papers, labs, and topics across Lattice.
This paper introduces GS-STVSR, a novel framework for Continuous Spatio-Temporal Video Super-Resolution (C-STVSR) that leverages 2D Gaussian Splatting to avoid computationally expensive dense grid queries. The method models the spatio-temporal evolution of Gaussian kernels using optical flow-guided motion and covariance resampling alignment. Experiments demonstrate state-of-the-art quality and a significant speedup (3x at 32x temporal scale) compared to INR-based methods, making it highly practical.
Forget slow, INR-based video super-resolution: 2D Gaussian Splatting now delivers state-of-the-art quality with a 3x speedup, even at extreme temporal scales.
Continuous Spatio-Temporal Video Super-Resolution (C-STVSR) aims to simultaneously enhance the spatial resolution and frame rate of videos by arbitrary scale factors, offering greater flexibility than fixed-scale methods that are constrained by predefined upsampling ratios. In recent years, methods based on Implicit Neural Representations (INR) have made significant progress in C-STVSR by learning continuous mappings from spatio-temporal coordinates to pixel values. However, these methods fundamentally rely on dense pixel-wise grid queries, causing computational cost to scale linearly with the number of interpolated frames and severely limiting inference efficiency. We propose GS-STVSR, an ultra-efficient C-STVSR framework based on 2D Gaussian Splatting (2D-GS) that drives the spatiotemporal evolution of Gaussian kernels through continuous motion modeling, bypassing dense grid queries entirely. We exploit the strong temporal stability of covariance parameters for lightweight intermediate fitting, design an optical flow-guided motion module to derive Gaussian position and color at arbitrary time steps, introduce a Covariance resampling alignment module to prevent covariance drift, and propose an adaptive offset window for large-scale motion. Extensive experiments on Vid4, GoPro, and Adobe240 show that GS-STVSR achieves state-of-the-art quality across all benchmarks. Moreover, its inference time remains nearly constant at conventional temporal scales (X2--X8) and delivers over X3 speedup at extreme scales X32, demonstrating strong practical applicability.