Search papers, labs, and topics across Lattice.
1
0
2
By dynamically balancing fast adaptation and stable averaging, AMUSE delivers faster convergence and better final performance than AdamW and Muon, all without any learning rate tuning.