Search papers, labs, and topics across Lattice.
2
0
3
3
Achieve 75% input length reduction in LLMs with minimal performance loss by compressing token embeddings directly in the latent space.
By enabling draft models to "contemplate the future," ConFu achieves significant speedups in speculative decoding, outperforming EAGLE-3 by 8-11% on Llama-3 models.