SEUUniversity of ThessalyUvAApr 13, 2026arXiv:2604.11948

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania

AI Summary

This paper introduces AILFM, an Active Imitation Learning (AIL) framework, to address thermal management and performance optimization challenges in 3D S-NUCA many-core systems running Large Foundation Model (LFM) inference. AILFM learns near-optimal thermal-aware scheduling policies by imitating Oracle demonstrations, effectively managing thread migration and V/f scaling based on core-level performance heterogeneity and kernel-specific behavior. Experiments demonstrate that AILFM outperforms state-of-the-art baselines and generalizes well across diverse LFM workloads, showcasing its effectiveness in balancing thermal safety and performance.

Key Contribution

Ditch the GPU? Active Imitation Learning can tame the thermal chaos of running large foundation models on 3D-stacked CPUs, unlocking a cost-effective alternative for LFM inference.

Abstract

Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-performance general-purpose CPUs, especially emerging 3D-stacked Static Non-Uniform Cache Architecture (3D S-NUCA) systems. These architectures offer enhanced bandwidth and locality but suffer from severe thermal challenges and uneven cache latencies due to 3D Networks-on-Chip (NoC). Optimal management of thread migration and V/f scaling is non-trivial due to LFM kernel diversity and system heterogeneity. Existing thermal management approaches often rely on oversimplified analytical models and lack adaptability. We propose AILFM, an Active Imitation Learning (AIL)-based scheduling framework that learns near-optimal thermal-aware scheduling policies from Oracle demonstrations with minimal run-time overhead. AILFM accounts for both core-level performance heterogeneity and kernel-specific behavior in LFMs to maintain thermal safety while maximizing performance. Extensive experiments show that AILFM outperforms state-of-the-art baselines and generalizes well across diverse LFM workloads.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Related Papers