Search papers, labs, and topics across Lattice.
This study enhances the assessment of dysarthric speech by integrating data from speech synthesis evaluations, specifically using Mean Opinion Score (MOS) labels from the QualiSpeech corpus. The researchers found that fine-tuning models on this synthesis data significantly improved performance in predicting both intelligibility and naturalness of dysarthric speech, with joint training yielding particularly strong results in naturalness. These findings indicate that perceptual similarities between synthesis artifacts and dysarthric speech can be leveraged to alleviate the challenges posed by limited clinical annotations.
Fine-tuning on speech synthesis data can dramatically boost the intelligibility and naturalness assessment of dysarthric speech, revealing a novel pathway to enhance clinical evaluations.
Dysarthria is a speech disorder marked by reduced intelligibility and communicative effectiveness. Automatic utterance-level assessment of dysarthric speech can support scalable speech monitoring and therapy-related analysis. Yet training such systems is bottlenecked by the scarcity of clinically annotated dysarthric speech. This work proposes to augment dysarthric speech assessment using data from speech synthesis evaluations, specifically human-annotated utterances with Mean Opinion Score (MOS) labels from the QualiSpeech corpus. Experiments show that fine-tuning on speech synthesis assessment data consistently improves performance on both intelligibility and naturalness prediction, while joint training yields gains primarily on naturalness. These results suggest that synthesis artifacts and dysarthric speech share perceptual commonalities, and speech synthesis evaluation corpora offer a practical augmentation source that reduces reliance on scarce clinical annotations.