Search papers, labs, and topics across Lattice.
This paper investigates the impact of various data augmentation (DA) strategies on speech enhancement (SE) performance for pathological speech, addressing the challenge of limited data and atypical acoustics. They evaluated transformative, generative, and noise augmentation techniques using both predictive and generative SE models. Results indicate that noise augmentation provides the most consistent and significant improvements, while generative augmentation can be detrimental, and the effectiveness of DA varies depending on the SE model architecture.
Noise augmentation consistently delivers the largest and most robust gains for pathological speech enhancement, outperforming transformative and generative approaches.
The performance of state-of-the-art speech enhancement (SE) models considerably degrades for pathological speech due to atypical acoustic characteristics and limited data availability. This paper systematically investigates data augmentation (DA) strategies to improve SE performance for pathological speakers, evaluating both predictive and generative SE models. We examine three DA categories, i.e., transformative, generative, and noise augmentation, assessing their impact with objective SE metrics. Experimental results show that noise augmentation consistently delivers the largest and most robust gains, transformative augmentations provide moderate improvements, while generative augmentation yields limited benefits and can harm performance as the amount of synthetic data increases. Furthermore, we show that the effectiveness of DA varies depending on the SE model, with DA being more beneficial for predictive SE models. While our results demonstrate that DA improves SE performance for pathological speakers, a performance gap between neurotypical and pathological speech persists, highlighting the need for future research on targeted DA strategies for pathological speech.