Search papers, labs, and topics across Lattice.
2
0
4
0
By enabling draft models to "contemplate the future," ConFu achieves significant speedups in speculative decoding, outperforming EAGLE-3 by 8-11% on Llama-3 models.
Diffusion language models have surprisingly redundant early layers, enabling nearly 20% FLOPs reduction at inference time via layer skipping without sacrificing performance.