Mar 31, 2026arXiv:2603.29262

Grokking From Abstraction to Intelligence

Junjie Zhang, Zhen Shen, Gang Xiong, Xisong Dong

AI Summary

This paper investigates the mechanistic origins of grokking, arguing it arises from a spontaneous simplification of internal model structures driven by parsimony. They use causal, spectral, and algorithmic complexity measures, along with Singular Learning Theory, to show that grokking corresponds to the physical collapse of redundant manifolds and deep information compression. This provides a new perspective on model overfitting and generalization.

Key Contribution

Grokking isn't just about local circuits or optimization tricks, but a global structural collapse of redundant model manifolds, revealing a deep connection between compression and generalization.

Abstract

Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mechanistic origins of model generalization. Despite its significance, existing research remains narrowly focused on specific local circuits or optimization tuning, largely overlooking the global structural evolution that fundamentally drives this phenomenon. We propose that grokking originates from a spontaneous simplification of internal model structures governed by the principle of parsimony. We integrate causal, spectral, and algorithmic complexity measures alongside Singular Learning Theory to reveal that the transition from memorization to generalization corresponds to the physical collapse of redundant manifolds and deep information compression, offering a novel perspective for understanding the mechanisms of model overfitting and generalization.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Grokking From Abstraction to Intelligence

Related Papers