CMU MLMar 30, 2026arXiv:2603.29042

An Empirical Recipe for Universal Phone Recognition

Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen

AI Summary

The authors trained PhoneticXEUS, a phone recognition model, on large-scale multilingual data, achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through ablations across 100+ languages, they quantified the impact of self-supervised learning (SSL) representations, data scale, and loss objectives on phone recognition performance. Their analysis reveals error patterns across language families, accented speech, and articulatory features, providing insights into the challenges of universal phone recognition.

Key Contribution

Forget hand-tuning: this recipe for universal phone recognition leverages large-scale multilingual data and SSL to achieve SOTA performance across 100+ languages.

Abstract

Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We present PhoneticXEUS -- trained on large-scale multilingual data and achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through controlled ablations with evaluations across 100+ languages under a unified scheme, we empirically establish our training recipe and quantify the impact of SSL representations, data scale, and loss objectives. In addition, we analyze error patterns across language families, accented speech, and articulatory features. All data and code are released openly.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

An Empirical Recipe for Universal Phone Recognition

Related Papers