Search papers, labs, and topics across Lattice.
This paper introduces AP-GRPO, a novel framework for reconstructing distorted pathological speech by leveraging reliable audible anchors within the speech signal. By employing an anchor-gated reward system and inter-anchor phonetic alignment, the method effectively enhances the accuracy of speech reconstruction across various neurodegenerative conditions. The results demonstrate that AP-GRPO not only improves reconstruction fidelity but also adapts to the specific characteristics of different speech impairments, revealing interpretable profiles of disease severity.
AP-GRPO reveals that leveraging reliable speech anchors can significantly enhance the reconstruction of distorted speech, adapting to the severity of neurodegenerative conditions.
Pathological speech from patients with neurodegenerative and neuromotor disorders is often acoustically distorted and linguistically fragmented, making pathological speech reconstruction necessary to recover intended textual content from distorted and incomplete speech recordings. Crucially, such recordings are rarely uniformly degraded: some words or short phrases remain reliable and can serve as audible anchors for reconstructing the corrupted surrounding content. We introduce Anchor-gated Phonetic Group Relative Policy Optimization (AP-GRPO), a GRPO framework with phonetic reward that aligns speech language models (SLMs) through audible-anchor preservation and inter-anchor phonetic compatibility to the original speech signal. AP-GRPO consists of: (i) an anchor-gated reward that matches reliable audible anchors in clear regions; and (ii) an inter-anchor phonetic alignment reward that evaluates whether recovered contents are phonetically supported by the corresponding corrupted inter-anchor speech span. Across four disease conditions, AP-GRPO improves faithful speech reconstruction, and the learned anchor constraint automatically adapts to each condition and thus reveals interpretable disease-specific profiles: conditions with severe articulatory degradation require stronger anchor enforcement, whereas milder impairment or linguistically impaired conditions rely more on phonetic alignment for inter-anchor recovery.