Mar 16, 2026arXiv:2603.15153

TextOVSR: Text-Guided Real-World Opera Video Super-Resolution

Hua Chang, Xin Xu, Wei Liu, Jiayi Wu, Kui Jiang, Fei Ma, Qi Tian

AI Summary

The paper introduces TextOVSR, a text-guided dual-branch network for real-world opera video super-resolution, addressing the challenges of complex degradations and lack of semantic guidance in restoring old opera videos. TextOVSR uses degradation-descriptive text in a negative branch to constrain the solution space and content-descriptive text in a positive branch along with a Text-Enhanced Discriminator (TED) for semantic guidance. The proposed Degradation-Robust Feature Fusion (DRF) module facilitates cross-modal feature fusion, leading to superior performance on the OperaLQ benchmark compared to existing RWVSR methods.

Key Contribution

Reviving degraded opera videos gets a boost: TextOVSR leverages text prompts to guide super-resolution, outperforming state-of-the-art methods by explicitly modeling both degradation and content semantics.

Abstract

Many classic opera videos exhibit poor visual quality due to the limitations of early filming equipment and long-term degradation during storage. Although real-world video super-resolution (RWVSR) has achieved significant advances in recent years, directly applying existing methods to degraded opera videos remains challenging. The difficulties are twofold. First, accurately modeling real-world degradations is complex: simplistic combinations of classical degradation kernels fail to capture the authentic noise distribution, while methods that extract real noise patches from external datasets are prone to style mismatches that introduce visual artifacts. Second, current RWVSR methods, which rely solely on degraded image features, struggle to reconstruct realistic and detailed textures due to a lack of high-level semantic guidance. To address these issues, we propose a Text-guided Dual-Branch Opera Video Super-Resolution (TextOVSR) network, which introduces two types of textual prompts to guide the super-resolution process. Specifically, degradation-descriptive text, derived from the degradation process, is incorporated into the negative branch to constrain the solution space. Simultaneously, content-descriptive text is incorporated into a positive branch and our proposed Text-Enhanced Discriminator (TED) to provide semantic guidance for enhanced texture reconstruction. Furthermore, we design a Degradation-Robust Feature Fusion (DRF) module to facilitate cross-modal feature fusion while suppressing degradation interference. Experiments on our OperaLQ benchmark show that TextOVSR outperforms state-of-the-art methods both qualitatively and quantitatively. The code is available at https://github.com/ChangHua0/TextOVSR.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TextOVSR: Text-Guided Real-World Opera Video Super-Resolution

Related Papers