Mar 16, 2026arXiv:2603.14983

Cepstral Smoothing of Binary Masks for Convolutive Blind Separation of Speech Mixtures

AI Summary

This paper introduces a speech separation system that combines blind source separation (BSS) with cepstral smoothing of binary time-frequency masks to extract two speech signals from two microphone recordings. The system estimates binary masks from the BSS output and then applies cepstral smoothing to reduce musical noise, a common artifact in time-frequency masking approaches. Experiments using both simulated and real speech mixtures demonstrate the effectiveness of the proposed system in improving speech separation quality.

Key Contribution

Cepstral smoothing of binary masks can significantly reduce musical noise in blind source separation, leading to cleaner speech extraction from mixed audio.

Abstract

In this paper, we propose a novel separation system for extracting two speech signals from two microphone recordings. Our system combines the blind source separation technique with cepstral smoothing of binary time-frequency masks. The last is composed of two steps. First, the two binary masks are estimated from the separated output signals of BSS algorithm. In the second step, a cepstral smoothing is applied of these spectral masks in order to reduce musical noise typically produced by time-frequency masking. Experiments were carried out with both artificially mixed speech signals using simulated room model and two real recordings. The evaluation results are promising and have shown the effectiveness of our system.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cepstral Smoothing of Binary Masks for Convolutive Blind Separation of Speech Mixtures

Related Papers