Search papers, labs, and topics across Lattice.
School of Computer Science, Peking University
2
0
3
DFlare achieves up to 5.52x speedup in LLM inference by allowing draft layers to independently leverage richer target knowledge, breaking through previous capacity constraints.
Speculative decoding gets a throughput boost of up to 4.32x by using reinforcement learning to dynamically balance drafting and verification.