Search papers, labs, and topics across Lattice.
This paper introduces IF-CorrNet, a novel neural network architecture for monaural speech dereverberation that estimates multi-frame deep filters by explicitly exploiting inter-frame STFT correlations. By shifting the learning objective from direct spectral mapping to filter estimation based on inter-frame correlations, IF-CorrNet constrains the solution space and improves generalization to real-world reverberant environments. Experiments on the REVERB Challenge dataset show that IF-CorrNet achieves significant SRMR gains on real data, demonstrating its robustness in suppressing reverberation and noise.
By learning to estimate filters from inter-frame correlations, IF-CorrNet achieves state-of-the-art monaural speech dereverberation performance in real-world environments, sidestepping the generalization issues of direct spectral mapping approaches.
Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.