Search papers, labs, and topics across Lattice.
1
0
3
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.