Search papers, labs, and topics across Lattice.
3
0
4
0
Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.
Forget slow attention: FlashPrefill achieves a staggering 27x speedup in long-context prefilling by instantly discovering and thresholding sparse attention patterns.
Reasoning models aren't just verbose, they're actively *harmed* by their own verbosity, but a simple self-distillation trick can compress their outputs by up to 59% while boosting accuracy by up to 16 points.