Search papers, labs, and topics across Lattice.
This paper introduces a CLAP-based approach for automatic word-naming recognition in post-stroke aphasia, addressing the limitations of traditional ASR systems with disfluent speech. The method frames word recognition as an audio-text matching problem, projecting speech and textual prompts into a shared embedding space. Experiments on two French aphasia datasets demonstrate up to 90% accuracy, surpassing classification and ASR baselines.
CLAP-based audio-text matching enables accurate word recognition in aphasic speech, even with disfluencies that stump traditional ASR systems.
Conventional automatic word-naming recognition systems struggle to recognize words from post-stroke patients with aphasia because of disfluencies and mispronunciations, limiting reliable automated assessment in this population. In this paper, we propose a Contrastive Language-Audio Pretraining (CLAP) based approach for automatic word-naming recognition to address this challenge by leveraging text-audio alignment. Our approach treats word-naming recognition as an audio-text matching problem, projecting speech signals and textual prompts into a shared embedding space to identify intended words even in challenging recordings. Evaluated on two speech datasets of French post-stroke patients with aphasia, our approach achieves up to 90% accuracy, outperforming existing classification-based and automatic speech recognition-based baselines.