CASSYSUApr 21, 2026arXiv:2604.19337

POLAR-PIC: A Holistic Framework for Matrixized PIC with Co-Designed Compute, Layout, and Communication

Y. Rao, Yizhuo Rao, Xingjian Cui, Xing-Li Cui, Shangzhi Pang, Jiabin Xie, Guangnan Feng, Jinhui Wei, Ziyan Zhang, Languang Gao, Zhenyu Wang, Zhiguang Chen, Yutong Lu

AI Summary

POLAR-PIC is introduced as a co-designed framework for Particle-in-Cell (PIC) simulations, addressing scalability bottlenecks in particle-grid interactions and redistribution costs. It reformulates field interpolation into a matrix-friendly form, maintains physically ordered particle layouts, and overlaps particle communication with deposition. Evaluation on an Exascale supercomputer pilot system shows POLAR-PIC accelerates the particle-processing phase by up to 10.9x in uniform plasma and 4.4x in laser-ion acceleration scenarios compared to WarpX, while also demonstrating strong weak scaling efficiency on over 2 million cores.

Key Contribution

By reformulating particle-in-cell simulations for matrix processing units and co-designing compute, layout, and communication, POLAR-PIC achieves up to 10.9x speedups and maintains 67.5% weak scaling efficiency on over 2 million cores, outperforming GPU-optimized baselines.

Abstract

Particle-in-Cell (PIC) simulations are fundamental to plasma physics but often suffer from limited scalability due to particle-grid interaction bottlenecks and particle redistribution costs. Specifically, the particle-grid interaction computations have not taken full advantage of the emerging Matrix Processing Units (MPUs), the particle motion introduces irregular memory accesses, and the bulk-synchronous redistribution further destroys long-term data locality thereby limiting parallel efficiency. To address these inefficiencies, we present POLAR-PIC, a co-designed framework for large-scale PIC simulations that (i) reformulates Field Interpolation into an MPU-friendly outer-product form, (ii) maintains a physically ordered particle layout to preserve memory contiguity, and (iii) overlaps particle communication with Deposition to hide redistribution overhead. The evaluation on the pilot system of an Exascale supercomputer demonstrates that POLAR-PIC accelerates the entire particle-processing phase by up to 10.9x in uniform plasma and 4.4x in real-world laser-ion acceleration scenarios compared to the native WarpX reference pipeline on LX2. Ablation studies reveal that the speedups achieved by Interpolation and Deposition are 8.0x and 13.2x, respectively, and the asynchronous communication design sustains a 99.1% overlap ratio. In cross-platform comparisons, POLAR-PIC achieves 13.2% of theoretical peak efficiency on the CPU-based LS system, while WarpX reaches 9.6% on NVIDIA A800 GPUs. Notably, the scalability evaluation demonstrates that POLAR-PIC maintains 67.5% weak scaling efficiency on over 2 million cores under high-migration dynamic workloads, highlighting the importance of holistic co-design for future matrix-centric HPC systems.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

POLAR-PIC: A Holistic Framework for Matrixized PIC with Co-Designed Compute, Layout, and Communication

Related Papers