Search papers, labs, and topics across Lattice.
This chapter reviews computational models of early language acquisition, focusing on self-supervised and visually grounded approaches that minimize linguistic priors. It highlights the increasing power of these models in learning speech aspects and explaining early language development through shared learning principles. The review also discusses the growing realism of simulations in terms of input data and linking model behavior to empirical infant language development findings.
Self-supervised and visually grounded models are closing the gap in explaining how infants learn language from raw acoustic and visual input, challenging the need for strong linguistic priors.
Learning to understand speech appears almost effortless for typically developing infants, yet from an information-processing perspective, acquiring a language from acoustic speech is an enormous challenge. This chapter reviews recent developments in using computational models to understand early language acquisition from speech and audiovisual input. The focus is on self-supervised and visually grounded models of perceptual learning. We show how these models are becoming increasingly powerful in learning various aspects of speech without strong linguistic priors, and how many features of early language development can be explained through a shared set of learning principles-principles broadly compatible with multiple theories of language acquisition and human cognition. We also discuss how modern learning simulations are gradually becoming more realistic, both in terms of input data and in linking model behavior to empirical findings on infant language development.