Search papers, labs, and topics across Lattice.
University of Kentucky
1
0
3
Pushing speculative decoding to new heights, SpecKV adaptively tunes speculation length based on draft model confidence, achieving a 56% speedup compared to fixed-length speculation, especially crucial for compressed models.