Search papers, labs, and topics across Lattice.
This paper investigates the grokking phenomenon in neural networks, revealing it as a dimensional phase transition where the effective dimensionality *D* of the gradient field shifts from sub-diffusive to super-diffusive. The authors analyze gradient avalanche dynamics across various model scales, demonstrating that the transition to generalization coincides with *D* crossing 1, indicating self-organized criticality. Furthermore, they show that *D* reflects gradient field geometry influenced by backpropagation correlations, rather than solely depending on network architecture.
Grokking isn't just about memorization then generalization; it's a dimensional phase transition in the gradient field, revealing a fundamental shift in how networks learn.
Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.