Search papers, labs, and topics across Lattice.
This study explores the thermal constraints of deploying computing systems in space, specifically comparing GPUs with high-bandwidth memory (HBM) to compute-in-memory (CIM) accelerators. By employing a radiator-in-the-loop co-design methodology, the research links system performance (TOPS) to radiator cooling capacity, revealing that GPUs suffer from severe thermal hotspots leading to throttling, while CIM accelerators maintain a uniform heat distribution and superior performance. The findings indicate that CIM accelerators are significantly more efficient than GPUs for AI workloads in thermally-constrained space environments, making them a more viable option for future orbital data centers.
CIM accelerators can outperform GPUs in space by maintaining efficient heat distribution, crucial for high-performance AI workloads under thermal constraints.
The rapid growth in compute demand from artificial intelligence (AI) has driven a massive surge in data center construction, precipitating an energy and sustainability crisis. Motivated by the abundant solar energy in outer space and the recent sharp reduction in space launch costs, orbital data centers are emerging as a potential pathway for the future scaling of AI compute infrastructure. While the cold background in vacuum seems appealing for cooling, computing systems operating in space without convection ultimately rely on radiative cooling, requiring large-area radiators. Such limitations in thermal management pose a significant challenge for deploying the standard liquid/air-cooled computers in space. In this work, we investigate the impact of the thermal constraints in space on both graphics processing units (GPUs) with high-bandwidth memory (HBM) and the emerging compute-in-memory (CIM) accelerators. We develop a radiator-in-the-loop co-design methodology that directly links the permitted system TOPS (terra-operations per second) with the practical radiator cooling capacity in space. Our thermal simulations reveal that the separately located GPU die and HBMs create severe thermal hotspots under limited radiator capacity, necessitating GPU thermal throttling. In contrast, CIM accelerators exhibit a much more uniform heat distribution and consistently outperform GPUs in TOPS/W across a wide range of radiator budgets. We systematically evaluated the performance of CIM and GPU across various AI workloads and demonstrated that CIM has a magnified advantage for deployment in space under realistic thermal constraints.