Search papers, labs, and topics across Lattice.
This paper introduces a speech separation system that combines blind source separation (BSS) with cepstral smoothing of binary time-frequency masks to extract two speech signals from two microphone recordings. The system estimates binary masks from the BSS output and then applies cepstral smoothing to reduce musical noise, a common artifact in time-frequency masking approaches. Experiments using both simulated and real speech mixtures demonstrate the effectiveness of the proposed system in improving speech separation quality.
Cepstral smoothing of binary masks can significantly reduce musical noise in blind source separation, leading to cleaner speech extraction from mixed audio.
In this paper, we propose a novel separation system for extracting two speech signals from two microphone recordings. Our system combines the blind source separation technique with cepstral smoothing of binary time-frequency masks. The last is composed of two steps. First, the two binary masks are estimated from the separated output signals of BSS algorithm. In the second step, a cepstral smoothing is applied of these spectral masks in order to reduce musical noise typically produced by time-frequency masking. Experiments were carried out with both artificially mixed speech signals using simulated room model and two real recordings. The evaluation results are promising and have shown the effectiveness of our system.