Search papers, labs, and topics across Lattice.
1
0
3
LLMs train 1.5x faster and generalize better with a surprisingly simple trick: adapt learning rates per-layer based on the "heavy-tailedness" of their weight matrices.