Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
1
0
3
Multi-round LLM inference gets a major speed boost with AMPD, a new disaggregated serving framework that intelligently manages interleaved prefill-decode workloads.