Search papers, labs, and topics across Lattice.
This paper investigates capability emergence in neural networks by tracking geometric measures across various model scales, tasks, and layers. The authors find a scale-invariant representation collapse to task-specific floors during training, followed by a top-down propagation through network layers. Furthermore, they observe a geometric hierarchy where representation geometry precedes capability emergence, while other measures like the local learning coefficient lag behind.
Forget bottom-up feature building: neural networks actually learn new skills through a top-down collapse of representations, and the shape of that collapse predicts what they'll learn next.
Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not -- the precursor relationship requires task-training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.