UQApr 15, 2026arXiv:2604.13906

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Shuyun Wang, Hu Zhang, Xin Shen, Dadong Wang, Xin Yu

AI Summary

This paper introduces a new "blind" video recovery setting that removes the requirement for predefined masks of corrupted regions, a significant limitation of existing video restoration methods. To address the challenges of identifying corrupted regions and recovering content from extensive degradations, they propose a Metadata-Guided Diffusion Model (M-GDM) that leverages intrinsic video metadata (motion vectors and frame types) as corruption indicators. The M-GDM also incorporates a prior-driven mask predictor and a post-refinement module to preserve intact regions and enhance consistency, demonstrating superior performance in blind video recovery compared to existing methods.

Key Contribution

Ditch the manual masks: a new diffusion model leverages video metadata to blindly restore corrupted video, outperforming previous mask-dependent approaches.

Abstract

Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corrupted regions are available, but manually annotating these masks is labor-intensive and impractical in real-world scenarios. To address this limitation, we introduce a new blind video recovery setting that removes the reliance on predefined masks. This setting presents two major challenges: accurately identifying corrupted regions and recovering content from extensive and irregular degradations. We propose a Metadata-Guided Diffusion Model (M-GDM) to tackle these challenges. Specifically, intrinsic video metadata are leveraged as corruption indicators through a dual-stream metadata encoder that separately embeds motion vectors and frame types before fusing them into a unified representation. This representation interacts with corrupted latent features via cross-attention at each diffusion step. To preserve intact regions, we design a prior-driven mask predictor that generates pseudo masks using both metadata and diffusion priors, enabling the separation and recombination of intact and recovered regions through hard masking. To mitigate boundary artifacts caused by imperfect masks, a post-refinement module enhances consistency between intact and recovered regions. Extensive experiments demonstrate the effectiveness of our method and its superiority in blind video recovery. Code is available at: https://github.com/Shuyun-Wang/M-GDM.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Related Papers