Search papers, labs, and topics across Lattice.
1
0
2
Quantizing optimizer states in LLM pre-training introduces "staleness," but strategically timed resets can recover lost performance and reduce memory footprint.