Search papers, labs, and topics across Lattice.
1
0
Adam's generalization problem? This paper shows how periodically "going home" to momentum-based SGD can provably beat standard Adam and AdamW in generalization error and convergence speed.