May 26, 2026arXiv:2605.26418

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

Guilin Zhang, Chuan Sun, Chuanyi Sun, Kai Zhao, Shahryar Sarkani, S. Sarkani, John M. Fossaceca, John Fossaceca

AI Summary

This paper introduces RLScale-Bench, a benchmark for evaluating DRL agents against calibrated rule-based baselines in adaptive resource control scenarios. They found that a properly calibrated rule-based autoscaler outperforms six mainstream DRL algorithms (PPO, DQN, A2C, SAC, TD3, and DDPG) in terms of cost across six different workload patterns. Their analysis highlights the importance of baseline calibration, reward engineering, and realistic evaluation protocols over algorithm selection for RL-based resource control.

Key Contribution

Turns out, your hand-tuned autoscaler probably beats that fancy RL agent you're trying to deploy for adaptive resource control.

Abstract

A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload we test - so when, if ever, does DRL actually help? We study this in RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control, where an agent allocates compute to a dynamic workload under cost and service-level constraints. We evaluate PPO, DQN, A2C, SAC, TD3, and DDPG under matched architectures, training budgets, and reward functions against a calibrated rule-based baseline across six workload patterns and five seeds (240 runs), instantiate the benchmark on Kubernetes Horizontal Pod Autoscaling, and probe distribution-shift generalization. Three findings challenge common assumptions: (i) the calibrated controller achieves the lowest cost on all six workloads, though it trails the best RL agents on bursty and flash traffic; (ii) discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations due to action-space mismatch; and (iii) no single algorithm dominates across workloads, with rankings shifting by up to four positions. The bottleneck in RL-based resource control is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols.

Distributed Systems & Hardware Eval Frameworks & Benchmarks Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

Related Papers