UIUCApr 2, 2026arXiv:2604.01832

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Xiaobin Rong, Yushi Wang, Zheng Wang, Jing Lu

AI Summary

GAP-URGENet, a novel generative-predictive fusion framework, was developed for universal speech enhancement by combining self-supervised speech restoration with spectrogram-domain enhancement. The generative branch reconstructs the waveform using a neural vocoder, while the predictive branch provides complementary cues in the spectrogram domain. Fusing these branches with a post-processing module yields improved robustness and perceptual quality, achieving state-of-the-art performance in the ICASSP 2026 URGENT Challenge.

Key Contribution

Generative and predictive models can be fused to achieve state-of-the-art speech enhancement, outperforming single-branch approaches in robustness and perceptual quality.

Abstract

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at https://xiaobin-rong.github.io/gap-urgenet_demo.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Citation Metrics

Citations0

Influential citations0

References16

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Related Papers