Search papers, labs, and topics across Lattice.
1
15
2
3
Gradient spikes in LLM training can be 1000x larger than normal, but a new optimizer, SPAM, tames them with momentum reset and spike-aware clipping, boosting performance and memory efficiency.