CMU MLMar 5, 2026arXiv:2603.04807

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang

AI Summary

This paper investigates how the inductive biases of convolutional neural networks (CNNs), specifically locality and weight sharing, alter the implicit regularization induced by gradient descent's edge-of-stability phenomenon. They prove that CNNs generalize on spherical data with a rate of $n^{-\frac{1}{6} +O(m/d)}$ when the receptive field size $m$ is small relative to the ambient dimension $d$, a regime where fully connected networks fail. This generalization improvement arises because weight sharing couples learned filters to the low-dimensional patch manifold, effectively bypassing the high dimensionality of the ambient space.

Key Contribution

CNNs' superior generalization isn't just about architecture; locality and weight sharing fundamentally reshape implicit regularization, allowing them to bypass the curse of dimensionality on difficult distributions where fully connected networks fail.

Abstract

We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by the global input geometry; consequently, it is insufficient to prevent overfitting on difficult distributions such as the high-dimensional sphere. In this paper, we show that locality and weight sharing fundamentally change this picture. Specifically, we prove that provided the receptive field size $m$ remains small relative to the ambient dimension $d$, these networks generalize on spherical data with a rate of $n^{-\frac{1}{6} +O(m/d)}$, a regime where fully connected networks provably fail. This theoretical result confirms that weight sharing couples the learned filters to the low-dimensional patch manifold, thereby bypassing the high dimensionality of the ambient space. We further corroborate our theory by analyzing the patch geometry of natural images, showing that standard convolutional designs induce patch distributions that are highly amenable to this stability mechanism, thus providing a systematic explanation for the superior generalization of convolutional networks over fully connected baselines.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References53

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Related Papers