Search papers, labs, and topics across Lattice.
University of Michigan
2
0
4
DLA adapts memory management in linear attention, preserving crucial information while reducing error accumulation over long sequences.
Cut LLM cold starts from minutes to seconds by pre-materializing CUDA graph execution contexts, sidestepping brittle kernel patching and heavyweight checkpointing.