Search papers, labs, and topics across Lattice.
This paper introduces a black-box online controller that optimizes LLM goodput (throughput of requests meeting service-level objectives) using hill climbing based on end-to-end measurements. The controller requires no internal instrumentation, making it broadly applicable. Experiments validate the design's effectiveness in improving LLM serving performance.
Maximize your LLM's goodput without diving into its internals: a new black-box controller uses hill climbing on end-to-end measurements to optimize performance.
In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbing to maximize goodput, defined as the throughput of requests that satisfy the service-level objective. We provide empirical evidence that this design is well-founded. Using this advance in LLM serving as a concrete example, we then discuss the importance of integrating system performance and sustainability metrics into Factsheets for organizations adopting AI systems.