Search papers, labs, and topics across Lattice.
1
0
2
Row/column normalization *before* orthogonalization can significantly boost convergence and reduce validation perplexity in LLaMA2 pretraining, outperforming the base Muon optimizer.