Search papers, labs, and topics across Lattice.
1
0
3
Ditch the polar decomposition: MUD offers a surprisingly simple and efficient alternative for momentum whitening, speeding up transformer training by up to 50% compared to AdamW and Muon.