Mar 3, 2026arXiv:2603.02860

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

Ferm'in Moscoso del Prado Mart'in, Fermín Moscoso del Prado Martín, Suchir Salhan

AI Summary

This paper models phoneme frequency distributions across languages using information theory at both macroscopic and microscopic levels. Macroscopically, they show that phoneme rank-frequency distributions follow the order statistics of a symmetric Dirichlet distribution, with the concentration parameter scaling with phonemic inventory size, indicating a compensation effect. Microscopically, they use a Maximum Entropy model with articulatory, phonotactic, and lexical constraints to predict language-specific phoneme probabilities.

Key Contribution

Phoneme frequency distributions across languages aren't random noise; they follow predictable patterns governed by inventory size and linguistic constraints, offering a unified information-theoretic account.

Abstract

We demonstrate that the frequency distribution of phonemes across languages can be explained at both macroscopic and microscopic levels. Macroscopically, phoneme rank-frequency distributions closely follow the order statistics of a symmetric Dirichlet distribution whose single concentration parameter scales systematically with phonemic inventory size, revealing a robust compensation effect whereby larger inventories exhibit lower relative entropy. Microscopically, a Maximum Entropy model incorporating constraints from articulatory, phonotactic, and lexical structure accurately predicts language-specific phoneme probabilities. Together, these findings provide a unified information-theoretic account of phoneme frequency structure.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

Related Papers