May 28, 2026arXiv:2605.29257

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

Tiantian Feng, Anfeng Xu, Xuan Shi, Aditya Kommineni, Shakhrul Iman Siam, Megan Micheletti, Zhonghao Shi, Helen Tager-Flusberg, Mi Zhang, Lynn K. Perry, Catherine Lord, Daniel S. Messinger, Shrikanth S. Narayanan

AI Summary

The authors introduce ChildVox, a benchmark designed to evaluate audio and speech models on a diverse range of child-related acoustic signals, spanning from physiological sounds to spoken language across developmental stages. It integrates over 20 sub-tasks across 17 datasets, facilitating cross-corpus and cross-domain comparisons. Evaluations of various audio and speech foundation models reveal ChildVox's utility in identifying high-performing models for recognizing children's acoustic signals and supporting applications like language level assessment and speech production tracking.

Key Contribution

ChildVox reveals the current capabilities and limitations of audio and speech models in understanding the nuanced and developing acoustic world of children.

Abstract

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering physiological sounds, non-linguistic vocalizations, canonical syllables, and spoken language. ChildVox integrates more than 20 sub-tasks across 17 child-centered audio and speech datasets, enabling systematic cross-corpus and cross-domain comparison. We evaluate a representative range of audio and speech foundation models, including self-supervised, ASR-oriented, and large audio-language models, on tasks including physiological sound classification, vocalization and canonical syllables modeling, and speech quality assessment and recognition. Benchmark results show that ChildVox provides a suite of high-performance models in recognizing a wide range of acoustic signals from children, supporting downstream applications such as characterizing children's language levels and tracking speech production with age.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Speech & Audio

Citation Metrics

Citations0

Influential citations0

References58

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

Related Papers