Search papers, labs, and topics across Lattice.
2
0
2
Row/column normalization *before* orthogonalization can significantly boost convergence and reduce validation perplexity in LLaMA2 pretraining, outperforming the base Muon optimizer.
Forget manual hyperparameter tuning: OptEMA achieves near-optimal deterministic convergence in zero-noise stochastic optimization, adapting automatically.