Mar 16, 2026arXiv:2603.15988

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Xiuwen Zheng, Minje Kim, C. Yoo, M. Hasegawa-Johnson

AI Summary

This paper introduces a three-stage data augmentation framework to improve dysarthric speech quality assessment (DSQA) by leveraging unlabeled dysarthric speech and large-scale typical speech datasets. The framework uses a teacher model to generate pseudo-labels, followed by weakly supervised pretraining with label-aware contrastive learning, and finally fine-tuning for DSQA. Experiments on five unseen datasets show the approach significantly outperforms existing DSQA predictors, achieving an average SRCC of 0.761.

Key Contribution

Overcome the scarcity of labeled data in dysarthric speech quality assessment with a novel data augmentation framework that leverages unlabeled data and outperforms state-of-the-art methods.

Abstract

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Related Papers