Search papers, labs, and topics across Lattice.
2
0
4
MONA unlocks faster LLM pretraining and superior downstream performance by turbocharging the Muon optimizer with Nesterov-style acceleration, leaving AdamW in the dust.
Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.