Search papers, labs, and topics across Lattice.
The paper introduces "steepest mirror flows" as a theoretical framework to analyze steepest descent methods like sign descent and Adam, revealing how optimization geometry impacts learning dynamics, implicit bias, and sparsity. Using diagonal linear networks and deep diagonal linear reparameterizations, the authors demonstrate that steeper descent facilitates both saddle-point escape and feature learning, overcoming limitations of gradient descent that requires unrealistically large learning rates. Empirically, they validate the importance of saddle-point escape in fine-tuning and show that decoupled weight decay (AdamW) stabilizes feature learning through novel balance equations.
Adam's edge over SGD in fine-tuning might boil down to its ability to nimbly escape saddle points and enforce better feature balance, a feat standard gradient descent struggles with.
How does the choice of optimization algorithm shape a model's ability to learn features? To address this question for steepest descent methods --including sign descent, which is closely related to Adam --we introduce steepest mirror flows as a unifying theoretical framework. This framework reveals how optimization geometry governs learning dynamics, implicit bias, and sparsity and it provides two explanations for why Adam and AdamW often outperform SGD in fine-tuning. Focusing on diagonal linear networks and deep diagonal linear reparameterizations (a simplified proxy for attention), we show that steeper descent facilitates both saddle-point escape and feature learning. In contrast, gradient descent requires unrealistically large learning rates to escape saddles, an uncommon regime in fine-tuning. Empirically, we confirm that saddle-point escape is a central challenge in fine-tuning. Furthermore, we demonstrate that decoupled weight decay, as in AdamW, stabilizes feature learning by enforcing novel balance equations. Together, these results highlight two mechanisms how steepest descent can aid modern optimization.