Mar 16, 2026arXiv:2603.14986

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Ui-Hyeop Shin, Jun Hyung Kim, Jangyeon Kim, Wooseok Kim, Hyung-Min Park

AI Summary

This paper introduces IF-CorrNet, a novel neural network architecture for monaural speech dereverberation that estimates multi-frame deep filters by explicitly exploiting inter-frame STFT correlations. By shifting the learning objective from direct spectral mapping to filter estimation based on inter-frame correlations, IF-CorrNet constrains the solution space and improves generalization to real-world reverberant environments. Experiments on the REVERB Challenge dataset show that IF-CorrNet achieves significant SRMR gains on real data, demonstrating its robustness in suppressing reverberation and noise.

Key Contribution

By learning to estimate filters from inter-frame correlations, IF-CorrNet achieves state-of-the-art monaural speech dereverberation performance in real-world environments, sidestepping the generalization issues of direct spectral mapping approaches.

Abstract

Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Related Papers