Search papers, labs, and topics across Lattice.
FC-VFI is introduced to improve video frame interpolation by incorporating fidelity cues from start and end frames using a temporal modeling strategy on latent sequences. Semantic matching lines are leveraged for structure-aware motion guidance, enhancing motion consistency, while a temporal difference loss further reduces temporal inconsistencies. Experiments demonstrate that FC-VFI achieves high performance and structural integrity, enabling 4x and 8x interpolation for high-FPS slow-motion video generation at high resolution.
Achieve 240 FPS slow-motion video from 30 FPS source while preserving visual fidelity and motion consistency, thanks to structure-aware motion guidance.
Large pre-trained video diffusion models excel in video frame interpolation but struggle to generate high fidelity frames due to reliance on intrinsic generative priors, limiting detail preservation from start and end frames. Existing methods often depend on motion control for temporal consistency, yet dense optical flow is error-prone, and sparse points lack structural context. In this paper, we propose FC-VFI for faithful and consistent video frame interpolation, supporting \(4\times\)x and \(8\times\) interpolation, boosting frame rates from 30 FPS to 120 and 240 FPS at \(2560\times 1440\)resolution while preserving visual fidelity and motion consistency. We introduce a temporal modeling strategy on the latent sequences to inherit fidelity cues from start and end frames and leverage semantic matching lines for structure-aware motion guidance, improving motion consistency. Furthermore, we propose a temporal difference loss to mitigate temporal inconsistencies. Extensive experiments show FC-VFI achieves high performance and structural integrity across diverse scenarios.