Search papers, labs, and topics across Lattice.
This paper introduces CoSTA, a novel data augmentation framework that leverages Cognitive-State-Conditioned Text-to-Speech (TTS) models to enhance speech-based detection of Alzheimer's Disease (AD) using limited pathological speech data. By synthesizing speech that reflects distinct characteristics of AD and Healthy Controls and evaluating the impact of different text sources on TTS augmentation, the authors demonstrate that ASR-driven augmentation significantly outperforms traditional manual transcripts. The results show a notable 4.16% improvement over baseline methods, achieving an audio-only accuracy of 85.83% on the ADReSS test set, thereby advancing the utility of synthetic speech in clinical applications.
ASR-driven data augmentation boosts Alzheimer's detection accuracy by over 4%, showcasing the potential of synthetic speech in clinical diagnostics.
Speech-based Alzheimer's Disease (AD) detection is constrained by scarce pathological speech data. To address this, we propose CoSTA, a Text-to-Speech (TTS)-based data augmentation framework. Specifically, we first develop two Cognitive-State-Conditioned (CS-Cond) TTS models by adapting CosyVoice2 and F5-TTS to synthesize speech with distinct AD and Healthy Control characteristics. Furthermore, by constructing a transcript pool comprising Manual Transcripts (MT) and 36 Automatic Speech Recognition (ASR) transcripts, we investigate the impact of text sources on TTS-based augmentation. We also perform augmentation-factor analysis and test-time augmentation. Experiments on the ADReSS dataset show that CS-Cond TTS significantly improves synthetic speech utility, and ASR-driven augmentation frequently outperforms MT-driven augmentation. Finally, CoSTA yields a 4.16% gain over the baseline, achieving an audio-only accuracy of 85.83% on the ADReSS test set and outperforming prior methods.