Search papers, labs, and topics across Lattice.
Beijing University Of Posts and Telecommunications
1
128
3
6
Muon optimizer now lets you train LLMs twice as fast as AdamW, as validated by a new 3B/16B MoE model called Moonlight.