Search papers, labs, and topics across Lattice.
1
0
3
4
Diffusion language models can achieve up to 26x inference speedups with almost no accuracy loss, thanks to a clever entropy-based KV caching strategy that avoids costly full forward passes.