CMU MLApr 9, 2026arXiv:2604.08415

Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation

Matthew Maciejewski, Matthew Maciejewski, Samuele Cornell, Samuele Cornell

AI Summary

The paper addresses the problem of training speech separation systems on noisy, real-world mixtures, where standard losses lead to noise retention. They introduce "ring mixing," a batching strategy where each source appears in two mixtures, coupled with a Signal-to-Consistency-Error Ratio (SCER) loss that penalizes inconsistent estimates of the same source across different mixtures. Experiments on a WHAM!-based benchmark show that this approach significantly reduces residual noise, enabling effective denoising from only noisy recordings and improving generalization to in-the-wild data like VoxCeleb.

Key Contribution

Training speech separation models on real-world noisy data doesn't have to mean accepting noisy outputs: this method cuts residual noise in half.

Abstract

Noisy speech separation systems are typically trained on fully-synthetic mixtures, limiting generalization to real-world scenarios. Though training on mixtures of in-domain (thus often noisy) speech is possible, we show that this leads to undesirable optima where mixture noise is retained in the estimates, due to the inseparability of the background noises and the loss function's symmetry. To address this, we propose ring mixing, a batch strategy of using each source in two mixtures, alongside a new Signal-to-Consistency-Error Ratio (SCER) auxiliary loss penalizing inconsistent estimates of the same source from different mixtures, breaking symmetry and incentivizing denoising. On a WHAM!-based benchmark, our method can reduce residual noise by upwards of half, effectively learning to denoise from only noisy recordings. This opens the door to training more generalizable systems using in-the-wild data, which we demonstrate via systems trained using naturally-noisy speech from VoxCeleb.

Data Curation & Synthetic Data Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation

Related Papers