Search papers, labs, and topics across Lattice.
2
0
3
2
Forget balanced trees: exploiting the quality gap between token sources in speculative decoding unlocks 33% faster inference without sacrificing accuracy.
Language models are increasingly doing their real work in the "invisible" latent space, not the tokens we see.