Search papers, labs, and topics across Lattice.
1
128
3
9
Muon optimizer now lets you train LLMs twice as fast as AdamW, as validated by a new 3B/16B MoE model called Moonlight.