Search papers, labs, and topics across Lattice.
Peking University
2
0
3
Scale vectors, despite being a tiny fraction of LLM parameters, are critical for pre-training, and this paper unlocks how to make them even better with simple, theoretically-grounded tweaks.
By strategically amplifying updates along flat directions in the loss landscape, LITE unlocks faster LLM pre-training with existing matrix-based optimizers like Muon and SOAP.