Search papers, labs, and topics across Lattice.
2
0
1
0
Lion optimizer's generalization error is worse than you thought ($O(\frac{1}{N\tau^T})$), but a simple tweak (CLion) can fix it, achieving $O(\frac{1}{N})$ with faster convergence.
Adam's generalization problem? This paper shows how periodically "going home" to momentum-based SGD can provably beat standard Adam and AdamW in generalization error and convergence speed.