Search papers, labs, and topics across Lattice.
This paper derives generalization bounds for stochastic gradient descent (SGD) in homogeneous neural networks, demonstrating that these networks allow for a slower stepsize decay of order $\Omega(1/\sqrt{t})$ compared to the typical $\mathcal{O}(1/t)$ required for algorithmic stability in non-convex settings. This result addresses the limitation of rigid stepsize decay that can hinder optimization in practice. The analysis is extended to non-Lipschitz regimes and applies broadly to fully-connected and convolutional networks with ReLU and LeakyReLU activations.
Homogeneous neural networks can be trained with provable generalization guarantees using significantly larger stepsizes than previously thought possible.
Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $\eta_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $\Omega(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.