Computer and Network CenterAug 1, 2025

Reinforcement Learning for AI as a Service: CPU-GPU Task Scheduling for Preprocessing, Training, and Inference Tasks

Ying-dar Lin, Yin-Tao Ling, Y. Lai, Didik Sudyana

AI Summary

This paper addresses the challenge of efficient task scheduling in AI as a Service (AIaaS) platforms, which involve preprocessing, training, and inference tasks on heterogeneous CPU-GPU systems. They propose a user-experience-and-performance-balanced reinforcement learning (UXP-RL) algorithm that considers 11 factors, including queuing task information and resource release time estimation, to optimize task assignment to CPUs or GPUs. Experimental results demonstrate that UXP-RL reduces average turnaround time by 27.66% to 57.81% compared to heuristic approaches, and a distributed UXP-RL scheduler further reduces turnaround time by 89.07% compared to a centralized scheduler.

Key Contribution

Ditch your heuristics: a new RL-based scheduler slashes AI-as-a-Service turnaround times by up to 58% compared to traditional methods.

Abstract

The rise of AI solutions has driven the emergence of AI as a Service (AIaaS), offering cost-effective and scalable solutions by outsourcing AI functionalities to specialized providers. Within AIaaS, three key components are essential: segmenting AI services into preprocessing, training, and inference tasks; utilizing GPU-CPU heterogeneous systems where GPUs handle parallel processing and CPUs manage sequential tasks; and minimizing latency in a distributed architecture consisting of cloud, edge, and fog computing. Efficient task scheduling is crucial to optimize performance across these components. In order to enhance task scheduling in AIaaS, we propose a user-experience-and-performance-balanced reinforcement learning (UXP-RL) algorithm. The UXP-RL algorithm considers 11 factors, including queuing task information. It then estimates resource release times and observes previous action outcomes, to select the optimal AI task for execution on either a GPU or CPU. This method effectively reduces the average turnaround time, particularly for rapid inference tasks. Our experimental findings show that the proposed RL-based scheduling algorithm reduces average turnaround time by 27.66% to 57.81% compared to the heuristic approaches such as SJF and FCFS. In a distributed architecture, utilizing distributed RL schedulers reduces the average turnaround time by 89.07% compared to a centralized scheduler.

Distributed Systems & Hardware Inference & Quantization Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References36

Year2025

VenueIEEE Transactions on Network and Service Management

Related Papers

Finding related papers...

Search

Reinforcement Learning for AI as a Service: CPU-GPU Task Scheduling for Preprocessing, Training, and Inference Tasks

Related Papers