Search papers, labs, and topics across Lattice.
Theoretical Division, Los Alamos National Laboratory, USA
1
0
3
Ditch the polar decomposition: MUD offers a surprisingly simple and efficient alternative for momentum whitening, speeding up transformer training by up to 50% compared to AdamW and Muon.