Search papers, labs, and topics across Lattice.
2
0
3
2
The secret to effectively pruning LLMs might not be *how* you search for redundant layers, but *what* you're optimizing for.
Diffusion language models can achieve up to 26x inference speedups with almost no accuracy loss, thanks to a clever entropy-based KV caching strategy that avoids costly full forward passes.