MIT CSAILEastern Memorial HospitalGirls High SchoolIIS Academia SinicaNational Taiwan Normal UniversityNTU TaiwanUSCYuan Ze UniversityJun 1, 2026arXiv:2606.01639

RRP-Voice: A Longitudinal Dataset and Benchmark for Recurrent Respiratory Papillomatosis Detection

Wenze Ren, Ke-Han Lu, Kai-Wei Chang, Tiantian Feng, Ching Fang, Zhi-Chi Liao, Dao Thi Hai Yen, Syu-Siang Wang, Yu Tsao, Chi-Te Wang, Shih-Hau Fang

AI Summary

This paper introduces RRP-Voice, the first longitudinal dataset for detecting Recurrent Respiratory Papillomatosis (RRP) through voice recordings, addressing the critical data scarcity in rare laryngeal diseases. The dataset includes recordings from 26 patients over a decade, featuring sustained vowels and sentence-level utterances, which are meticulously annotated and validated. The authors establish a comprehensive benchmark that evaluates various deep learning approaches, revealing that the discriminative signals correlate more with disease state than with stable speaker characteristics, thereby paving the way for advancements in low-resource clinical voice monitoring.

Key Contribution

Voice recordings can reveal the oscillating states of Recurrent Respiratory Papillomatosis, providing a unique longitudinal perspective on a rare laryngeal disease.

Abstract

Deep learning has advanced pathological voice detection rapidly, yet rare laryngeal diseases remain underexplored due to data scarcity. Recurrent Respiratory Papillomatosis (RRP) exemplifies this gap: an HPV-induced disease of the larynx in which patients oscillate between recurrence and post-surgical remission over the years. RRP demands continuous voice monitoring that existing cross-sectional corpora cannot support. We introduce the first longitudinal voice dataset for RRP, comprising recordings from 26 patients with up to ten years of follow-up. Each session pairs sustained vowels with sentence-level utterances, which are annotated by otolaryngologists and confirmed synchronously with laryngoscopy. Building on this resource, we establish a systematic benchmark spanning handcrafted features, end-to-end deep networks, self-supervised pretrained models, and recent audio large language models, all evaluated under session-level cross-validation with patient-level audit. Per-subject longitudinal analyses further confirm that the cross-sectional discriminative signal reflects laryngoscopic disease state rather than stable speaker attributes. This work lays a foundation for rare longitudinal pathological voice tasks in low-resource clinical settings.

Data Curation & Synthetic Data Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RRP-Voice: A Longitudinal Dataset and Benchmark for Recurrent Respiratory Papillomatosis Detection

Related Papers