Search papers, labs, and topics across Lattice.
This paper introduces CloudCons, a novel benchmark designed to evaluate forecasting models specifically for cloud resource consolidation, addressing the limitations of existing benchmarks that focus solely on prediction accuracy. By utilizing high-quality datasets from major cloud providers, the study reveals that while foundation models excel in zero-shot forecasting accuracy, this does not guarantee improved decision utility in resource allocation. Notably, the research emphasizes the importance of predictive quantile selection as a key factor in optimizing the balance between resource efficiency and service reliability, providing actionable guidelines for practitioners.
Foundation models may excel at forecasting, but their accuracy doesn't always translate to better resource allocation decisions in cloud environments.
Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize consolidation by anticipating future demands. While emerging time series foundation models promise to enhance this paradigm through zero-shot generalization, existing benchmarks focus solely on prediction error metrics. The actual decision utility of these advanced models remains unverified, rendering their practical value for downstream tasks uncertain. To bridge this gap, we propose CloudCons, a comprehensive end-to-end benchmark designed to evaluate forecasting models within the specific context of cloud resource consolidation. We build high-quality datasets that cover diverse workloads from Huawei Cloud, Microsoft Azure, and Google Borg, capturing distinct service characteristics ranging from synchronized diurnal rhythms to stochastic, pulse-like bursts and high-frequency noise. We conduct an extensive evaluation of statistical, deep learning, and foundation models. Our experiments reveal a pivotal finding: while foundation models demonstrate superior zero-shot forecasting accuracy, this advantage does not inherently translate into better decision utility. Of practical significance, we systematically analyze how the selection of predictive quantiles acts as a critical lever. We provide actionable guidelines for calibrating these selections to balance the trade-off between resource efficiency and service reliability, offering vital insights for real-world deployment decisions.