Search papers, labs, and topics across Lattice.
This paper models phoneme frequency distributions across languages using information theory at both macroscopic and microscopic levels. Macroscopically, they show that phoneme rank-frequency distributions follow the order statistics of a symmetric Dirichlet distribution, with the concentration parameter scaling with phonemic inventory size, indicating a compensation effect. Microscopically, they use a Maximum Entropy model with articulatory, phonotactic, and lexical constraints to predict language-specific phoneme probabilities.
Phoneme frequency distributions across languages aren't random noise; they follow predictable patterns governed by inventory size and linguistic constraints, offering a unified information-theoretic account.
We demonstrate that the frequency distribution of phonemes across languages can be explained at both macroscopic and microscopic levels. Macroscopically, phoneme rank-frequency distributions closely follow the order statistics of a symmetric Dirichlet distribution whose single concentration parameter scales systematically with phonemic inventory size, revealing a robust compensation effect whereby larger inventories exhibit lower relative entropy. Microscopically, a Maximum Entropy model incorporating constraints from articulatory, phonotactic, and lexical structure accurately predicts language-specific phoneme probabilities. Together, these findings provide a unified information-theoretic account of phoneme frequency structure.