Search papers, labs, and topics across Lattice.
3
1
5
Forget static policies: Autopoiesis uses LLMs to continuously rewrite serving policy code, adapting to runtime dynamics in ways human-designed systems can't.
Multi-round LLM inference gets a major speed boost with AMPD, a new disaggregated serving framework that intelligently manages interleaved prefill-decode workloads.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.