Search papers, labs, and topics across Lattice.
1
0
2
2
Achieve up to 5.8x LLM inference speedup by decoupling causal dependency modeling from autoregressive draft execution in speculative decoding, sidestepping the usual trade-off between draft quality and drafting cost.