Search papers, labs, and topics across Lattice.
School of Computer Science, Peking University
1
0
2
DFlare achieves up to 5.52x speedup in LLM inference by allowing draft layers to independently leverage richer target knowledge, breaking through previous capacity constraints.