Search papers, labs, and topics across Lattice.
1
0
2
Skip the expensive per-sample loss tracking: a simple linear predictor using only token frequency statistics can accelerate language model training by up to 13%.