Search papers, labs, and topics across Lattice.
This study introduces NüshuVoice, the first text-to-speech (TTS) system designed for the endangered Nüshu script, addressing the challenge of limited acoustic data by constructing a comprehensive sentence-level dataset that integrates text, phonetic transcriptions, and archival recordings. The authors propose Nüshu-PitchVITS, an F0-conditioned VITS framework that utilizes Nüshu's unique five-level pitch notation to enhance prosodic accuracy in speech synthesis. Experimental results demonstrate that Nüshu-PitchVITS significantly outperforms existing TTS baselines in terms of spectral fidelity, pitch reconstruction, and intelligibility as rated by human listeners.
Nüshu-PitchVITS not only revives an endangered script but also sets a new benchmark in low-resource TTS systems by achieving superior speech synthesis quality through innovative pitch conditioning.
Nüshu is an endangered phonetic script historically used by women in Jiangyong County, southern Hunan, China. While existing computational studies of Nüshu mainly focus on textual digitization and visual recognition, the acoustic reconstruction of its authentic pronunciation remains largely unexplored. Building a Nüshu text-to-speech (TTS) system is particularly challenging because available recordings are extremely limited and mostly consist of isolated syllable-level pronunciations rather than natural sentence-level utterances. In this work, we introduce NüshuVoice, the first TTS benchmark for Nüshu. We construct a sentence-level Nüshu text-to-audio dataset that aligns standardized Unicode Nüshu text, phonetic transcriptions, standard Chinese translations, and archival recordings. To synthesize speech under this extreme low-resource setting, we propose Nüshu-PitchVITS, an F0-conditioned VITS framework that leverages Nüshu's five-level pitch notation as an explicit prosodic inductive bias. Experimental results show that Nüshu-PitchVITS outperforms strong TTS baselines in spectral fidelity, pitch reconstruction, and human-rated intelligibility. We publicly release the dataset and code at: https://anonymous.4open.science/r/Nvshu-TTS-2EB6.