Mar 9, 2026arXiv:2603.07917

SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity

Zhenghao Gan, Z. Gan, Yichen Bao, Yifei Liu, Chen Chen, Quan Chen, Minyi Guo

AI Summary

SageSched is introduced as a novel LLM scheduler designed to address the challenges of demand uncertainty and hybrid resource requirements (compute and memory) in LLM inference. It predicts output length distributions by combining prompt contents with past inference results. The scheduler then uses these predictions to inform an uncertainty-aware scheduling policy that optimizes overall efficiency by considering both compute and memory costs. Experiments demonstrate a 28.7% efficiency improvement compared to existing methods.

Key Contribution

Beat the LLM inference bottleneck: SageSched's uncertainty-aware scheduling boosts efficiency by nearly 30% by predicting output length and balancing compute and memory demands.

Abstract

Efficient LLM inference scheduling is crucial for user experience.However, LLM inferences exhibit remarkable demand uncertainty (with unknown output length beforehand) and hybridity (being both compute and memory intensive). Existing LLM schedulers rely on simple heuristics or focus purely on compute resource, suffering suboptimal performance. In this work, we propose SageSched, an efficient LLM scheduler that properly handles demand uncertainty and hybridity of inference workloads.SageSched combines prompt contents with the past inference results to predict output-length distribution in a light-weight and also accurate manner.Meanwhile, it models the true service cost of an inference request with both compute and memory aspects considered.Finally, SageSched employs an uncertainty-aware scheduling policy that can yield the best overall efficiency given the request cost distributions.Testbed experiments over diverse setups confirm that SageSched can attain an efficiency improvement of over 28.7%.

Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity

Related Papers