Search papers, labs, and topics across Lattice.
The authors introduce BabAR, a cross-linguistic phoneme recognition system for child speech, trained on TinyVox, a new corpus of over half a million phonetically transcribed child vocalizations across five languages. Pretraining on multilingual child-centered daylong recordings and using 20 seconds of surrounding audio context during fine-tuning significantly boosts performance. BabAR's automatic measures of speech maturity correlate with established developmental estimates, validating its utility for large-scale speech development studies.
A new cross-linguistic phoneme recognition system, BabAR, finally unlocks scalable analysis of early childhood speech development.
Studying early speech development at scale requires automatic tools, yet automatic phoneme recognition, especially for young children, remains largely unsolved. Building on decades of data collection, we curate TinyVox, a corpus of more than half a million phonetically transcribed child vocalizations in English, French, Portuguese, German, and Spanish. We use TinyVox to train BabAR, a cross-linguistic phoneme recognition system for child speech. We find that pretraining the system on multilingual child-centered daylong recordings substantially outperforms alternatives, and that providing 20 seconds of surrounding audio context during fine-tuning further improves performance. Error analyses show that substitutions predominantly fall within the same broad phonetic categories, suggesting suitability for coarse-grained developmental analyses. We validate BabAR by showing that its automatic measures of speech maturity align with developmental estimates from the literature.