WHUApr 6, 2026arXiv:2604.04655

Grokking as Dimensional Phase Transition in Neural Networks

AI Summary

This paper investigates the grokking phenomenon in neural networks, revealing it as a dimensional phase transition where the effective dimensionality *D* of the gradient field shifts from sub-diffusive to super-diffusive. The authors analyze gradient avalanche dynamics across various model scales, demonstrating that the transition to generalization coincides with *D* crossing 1, indicating self-organized criticality. Furthermore, they show that *D* reflects gradient field geometry influenced by backpropagation correlations, rather than solely depending on network architecture.

Key Contribution

Grokking isn't just about memorization then generalization; it's a dimensional phase transition in the gradient field, revealing a fundamental shift in how networks learn.

Abstract

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Grokking as Dimensional Phase Transition in Neural Networks

Related Papers