Search papers, labs, and topics across Lattice.
This paper introduces BabyMind, an object-first inductive bias designed to enhance grounded language learning from child-view video recordings, which often suffer from sparse and poorly synchronized caregiver speech. By employing a novel contrastive learning approach that utilizes object embeddings and tracking across utterance-centered windows, BabyMind effectively resolves ambiguities in identifying named referents amidst cluttered frames. The method demonstrates a significant improvement in accuracy on the SAYCam-S dataset, achieving a +2.6 point increase in Labeled-S forced-choice accuracy compared to previous methods, while also performing better on out-of-distribution benchmarks.
BabyMind outperforms existing models by leveraging an object-first approach that stabilizes learning in noisy, real-world child-view video data.
Learning grounded word meaning from natural experience requires resolving two ambiguities in infant-view recordings: when the named referent appears and where it is in a cluttered frame. In SAYCam-style data, caregiver speech is sparse and weakly synchronized with egocentric video, so single-frame contrastive pairing yields noisy positives in which the intended object is absent or entangled with distractors. We propose BabyMind, an object-first bias for child-view contrastive learning under sparse, noisy supervision. BabyMind extracts candidate object embeddings using an offline mask-based region interface, links candidates across a short utterance-centered window into lightweight object files via tracking, and aligns utterances to bags of object files with a prototype-space multiple-instance contrastive objective. Track-coherence and global-object agreement regularizers stabilize learning and transfer object-file structure into the global frame embedding used at evaluation. On SAYCam-S, BabyMind improves Labeled-S 15 forced-choice accuracy by +2.6 points over CVCL and yields consistent gains on in-vocabulary out-of-distribution benchmarks. Code is available at https://github.com/sathiiii/BabyMind.