Search papers, labs, and topics across Lattice.
Duke University
2
0
3
Diffusion LLMs can achieve up to 6.1x higher throughput than autoregressive models by dynamically adjusting decoding granularity based on real-time load, a feat unattainable with fixed-block approaches.
Cut LLM cold starts from minutes to seconds by pre-materializing CUDA graph execution contexts, sidestepping brittle kernel patching and heavyweight checkpointing.