Search papers, labs, and topics across Lattice.
This paper addresses the challenge of optimizing memory allocation in distributed systems by developing a predictive model based on a LightGBM and XGBoost ensemble. The model is trained to predict high conditional quantiles of memory usage and incorporates a multiplicative safety factor to mitigate the high cost of underallocations. Applied to a real-world dataset of build jobs from SAP, the proposed method reduces under-allocated jobs from 4.17% to 2.89% and average overallocation from 148% to 44.51%.
Slash memory waste by 100% while *decreasing* job failures? This predictive allocation method does it.
In modern distributed systems, efficient resource allocation is a vital aspect to maintain scalability, reduce operational costs, and ensure fast execution even across heterogeneous workloads. Predictive models for resource usage are essential tools for optimizing allocation and preventing system bottlenecks. Predictive memory allocation has asymmetric costs as a key challenge: underallocation causes failures while overallocation wastes memory. We propose a regression method based on a LightGBM and XGBoost ensemble trained to predict high conditional quantiles. To further account for the high cost of underallocations we add a multiplicative safety factor. With our method we are able to reduce the number of under-allocated jobs from $4.17\%$ to $2.89\%$ and average overallocation from 148% to 44.51% on a real-world dataset of build jobs provided by SAP. We further explore the pareto frontier between optimization for underallocation and for overallocation.