Search papers, labs, and topics across Lattice.
2
0
2
Row/column normalization *before* orthogonalization can significantly boost convergence and reduce validation perplexity in LLaMA2 pretraining, outperforming the base Muon optimizer.
A single normalization step turns Muon into Muon+, delivering consistent perplexity improvements in LLM pre-training.