Search papers, labs, and topics across Lattice.
V-Bridge leverages pretrained video generative models for few-shot image restoration by reframing restoration as a progressive generative process. By training a video model on only 1,000 multi-task samples, the framework achieves competitive performance across multiple restoration tasks compared to specialized architectures. This demonstrates that video generative models possess strong, transferable restoration priors that can be unlocked with minimal task-specific data.
Pretrained video models can achieve surprisingly strong few-shot image restoration, rivaling specialized architectures, by reframing restoration as a generative refinement process.
Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.