Search papers, labs, and topics across Lattice.
This paper addresses the challenge of efficient task scheduling in AI as a Service (AIaaS) platforms, which involve preprocessing, training, and inference tasks on heterogeneous CPU-GPU systems. They propose a user-experience-and-performance-balanced reinforcement learning (UXP-RL) algorithm that considers 11 factors, including queuing task information and resource release time estimation, to optimize task assignment to CPUs or GPUs. Experimental results demonstrate that UXP-RL reduces average turnaround time by 27.66% to 57.81% compared to heuristic approaches, and a distributed UXP-RL scheduler further reduces turnaround time by 89.07% compared to a centralized scheduler.
Ditch your heuristics: a new RL-based scheduler slashes AI-as-a-Service turnaround times by up to 58% compared to traditional methods.
The rise of AI solutions has driven the emergence of AI as a Service (AIaaS), offering cost-effective and scalable solutions by outsourcing AI functionalities to specialized providers. Within AIaaS, three key components are essential: segmenting AI services into preprocessing, training, and inference tasks; utilizing GPU-CPU heterogeneous systems where GPUs handle parallel processing and CPUs manage sequential tasks; and minimizing latency in a distributed architecture consisting of cloud, edge, and fog computing. Efficient task scheduling is crucial to optimize performance across these components. In order to enhance task scheduling in AIaaS, we propose a user-experience-and-performance-balanced reinforcement learning (UXP-RL) algorithm. The UXP-RL algorithm considers 11 factors, including queuing task information. It then estimates resource release times and observes previous action outcomes, to select the optimal AI task for execution on either a GPU or CPU. This method effectively reduces the average turnaround time, particularly for rapid inference tasks. Our experimental findings show that the proposed RL-based scheduling algorithm reduces average turnaround time by 27.66% to 57.81% compared to the heuristic approaches such as SJF and FCFS. In a distributed architecture, utilizing distributed RL schedulers reduces the average turnaround time by 89.07% compared to a centralized scheduler.