CMU MLIndependent ResearcherUniversityJun 18, 2026arXiv:2606.20179

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

Maxim Melichov, Yakov Kolani, Morris Alper

AI Summary

This paper introduces ReNikud, a novel approach to grapheme-to-phoneme (G2P) conversion for Modern Hebrew that leverages weak audio supervision and a pseudo-vocalization architecture to address the challenges posed by the language's abjad writing system. By utilizing a phoneme-based automatic speech recognition pipeline on extensive unlabeled audio data, ReNikud generates phonemic transcriptions that accurately reflect natural spoken language, overcoming limitations of traditional methods that rely on scarce vocalization data. The results demonstrate that ReNikud outperforms existing state-of-the-art G2P systems on both established benchmarks and a new targeted benchmark for spoken Hebrew, indicating its effectiveness for applications like text-to-speech.

Key Contribution

Weak audio supervision allows ReNikud to achieve superior grapheme-to-phoneme conversion for Hebrew, outperforming traditional methods that struggle with data scarcity and pronunciation accuracy.

Abstract

Grapheme-to-phoneme (G2P) conversion for Modern Hebrew is needed for applications like text-to-speech (TTS), but is challenging due to the language's abjad writing system, which leaves vowels largely unwritten, creating substantial ambiguity. Standard approaches first predict vowel diacritics (nikud) to produce International Phonetic Alphabet (IPA) transcriptions, but this is limited: vocalization data is scarce and laborious to produce, it does not specify features such as lexical stress, and it reflects formal grammatical rules rather than everyday spoken pronunciation. Direct sequence-to-sequence IPA prediction, meanwhile, struggles on limited data and fails to exploit the character-level alignment characteristic of abjads. Our method, ReNikud, overcomes these limitations with two key insights: (1) Weak audio supervision via a phoneme-based automatic speech recognition (ASR) pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio, yielding phonemic transcriptions that reflect natural spoken norms without manual annotation. (2) A pseudo-vocalization architecture that predicts IPA phonemes at each character position, enforcing character-level alignment as an inductive bias. Results on existing Hebrew G2P benchmarks and the new targeted MILIM benchmark for spoken Hebrew show that ReNikud surpasses previous state-of-the-art methods. We will release our code and trained models to support further work on Hebrew TTS and speech technologies.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

Related Papers