University of Health and Rehabilitation SciencesFeb 26, 2026arXiv:2602.22936

Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

Wenquan Ma, Wen-Xiu Ma, Yang Sui, Yang Sui, Jiaye Teng, Jiaye Teng, Bohan Wang, Jing Xu, Jingqin Yang, Jingqin Yang

AI Summary

This paper derives generalization bounds for stochastic gradient descent (SGD) in homogeneous neural networks, demonstrating that these networks allow for a slower stepsize decay of order $\Omega(1/\sqrt{t})$ compared to the typical $\mathcal{O}(1/t)$ required for algorithmic stability in non-convex settings. This result addresses the limitation of rigid stepsize decay that can hinder optimization in practice. The analysis is extended to non-Lipschitz regimes and applies broadly to fully-connected and convolutional networks with ReLU and LeakyReLU activations.

Key Contribution

Homogeneous neural networks can be trained with provable generalization guarantees using significantly larger stepsizes than previously thought possible.

Abstract

Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $\eta_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $\Omega(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References81

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

Related Papers