Stanford HAIApr 22, 2026arXiv:2604.20446

The Origin of Edge of Stability

AI Summary

This paper investigates the phenomenon known as the Edge of Stability in neural networks, where full-batch gradient descent drives the largest Hessian eigenvalue to the critical threshold of $2/η$. By introducing the concept of edge coupling, the authors derive a step recurrence and a loss-change formula that elucidate why trajectories converge to this threshold from arbitrary initializations. The findings provide a unified explanation for the self-regulation behavior observed near the edge, revealing the intricate relationship between the dynamics of gradient descent and the curvature of the loss landscape.

Key Contribution

The trajectory of gradient descent is not random; it is systematically forced toward the critical threshold of $2/η$, revealing a hidden structure in neural network optimization.

Abstract

Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/η$, where $η$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward $2/η$ from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary $2/η$, and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward $2/η$. The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding exact forcing of the Hessian eigenvalue with no gap. Setting both gradients of the edge coupling to zero classifies fixed points and period-two orbits; near a fixed point, the problem reduces to a function of the half-amplitude alone, which determines which directions support period-two orbits and on which side of the critical learning rate they appear.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Origin of Edge of Stability

Related Papers