Search papers, labs, and topics across Lattice.
This paper introduces RRP-Voice, the first longitudinal dataset for detecting Recurrent Respiratory Papillomatosis (RRP) through voice recordings, addressing the critical data scarcity in rare laryngeal diseases. The dataset includes recordings from 26 patients over a decade, featuring sustained vowels and sentence-level utterances, which are meticulously annotated and validated. The authors establish a comprehensive benchmark that evaluates various deep learning approaches, revealing that the discriminative signals correlate more with disease state than with stable speaker characteristics, thereby paving the way for advancements in low-resource clinical voice monitoring.
Voice recordings can reveal the oscillating states of Recurrent Respiratory Papillomatosis, providing a unique longitudinal perspective on a rare laryngeal disease.
Deep learning has advanced pathological voice detection rapidly, yet rare laryngeal diseases remain underexplored due to data scarcity. Recurrent Respiratory Papillomatosis (RRP) exemplifies this gap: an HPV-induced disease of the larynx in which patients oscillate between recurrence and post-surgical remission over the years. RRP demands continuous voice monitoring that existing cross-sectional corpora cannot support. We introduce the first longitudinal voice dataset for RRP, comprising recordings from 26 patients with up to ten years of follow-up. Each session pairs sustained vowels with sentence-level utterances, which are annotated by otolaryngologists and confirmed synchronously with laryngoscopy. Building on this resource, we establish a systematic benchmark spanning handcrafted features, end-to-end deep networks, self-supervised pretrained models, and recent audio large language models, all evaluated under session-level cross-validation with patient-level audit. Per-subject longitudinal analyses further confirm that the cross-sectional discriminative signal reflects laryngoscopic disease state rather than stable speaker attributes. This work lays a foundation for rare longitudinal pathological voice tasks in low-resource clinical settings.