Search papers, labs, and topics across Lattice.
1
0
3
8
Stop guessing at prefill/decode resource allocation: this method accurately predicts the optimal split for disaggregated LLM inference, balancing throughput and SLOs.