Search papers, labs, and topics across Lattice.
1
0
3
Optimal LLM pretraining actually requires *overtraining* when you account for inference costs, overturning conventional scaling wisdom.