Search papers, labs, and topics across Lattice.
Duke University
1
0
2
Diffusion LLMs can achieve up to 6.1x higher throughput than autoregressive models by dynamically adjusting decoding granularity based on real-time load, a feat unattainable with fixed-block approaches.