Search papers, labs, and topics across Lattice.
2
0
4
1
AgentServe achieves up to 2.8x improvement in time-to-first-token and 2.7x in tokens-per-output-token for agentic workloads on a single GPU by strategically isolating prefills and decodes.
Stop leaving performance on the table: jointly optimizing resource allocation and request batching with reinforcement learning can yield up to 24x speedups for multi-tenant GPU inference.