Search papers, labs, and topics across Lattice.
This paper benchmarks the FleCSI framework, comparing MPI and asynchronous many-task runtimes (AMTRs) like Legion and HPX, using a Poisson solver and the HARD radiation hydrodynamics code. They quantified the performance and overhead of AMTR backends relative to MPI on up to 1024 nodes, finding that HPX introduces marginal overhead compared to MPI+Kokkos in communication-focused tasks, and outperforms MPI in computation-focused radiation hydrodynamics benchmarks on fewer nodes. The study highlights the potential of AMTRs for specific computational workloads while also revealing current limitations in scalability and collective operations.
Asynchronous many-task runtimes like HPX can outperform MPI in computation-heavy radiation hydrodynamics simulations, but scalability bottlenecks remain.
Writing efficient distributed code remains a labor-intensive and complex endeavor. To simplify application development, the Flexible Computational Science Infrastructure (FleCSI) framework offers a user-oriented, high-level programming interface that is built upon a task-based runtime model. Internally, FleCSI integrates state-of-the-art parallelization backends, including MPI and the asynchronous many-task runtimes (AMTRs) Legion and HPX, enabling applications to fully leverage asynchronous parallelism. In this work, we benchmark two applications using FleCSI's three backends on up to 1024 nodes, intending to quantify the advantages and overheads introduced by the AMTR backends. As representative applications, we select a simple Poisson solver and the multidimensional radiation hydrodynamics code HARD. In the communication-focused Poisson solver benchmark, FleCSI achieves over 97% parallel efficiency using the MPI backend under weak scaling on up to 131072 cores, indicating that only minimal overhead is introduced by its abstraction layer. While the Legion backend exhibits notable overheads and scaling limitations, the HPX backend introduces only marginal overhead compared to MPI+Kokkos. However, the scalability of the HPX backend is currently limited due to the usage of non-optimized HPX collective operations. In the computation-focused radiation hydrodynamics benchmarks, the performance gap between the MPI and HPX backends fades. On fewer than 64 nodes, the HPX backend outperforms MPI+Kokkos, achieving an average speedup of 1.31 under weak scaling and up to 1.27 under strong scaling. For the hydrodynamics-only HARD benchmark, the HPX backend demonstrates superior performance on fewer than 32 nodes, achieving speedups of up to 1.20 relative to MPI and up to 1.64 relative to MPI+Kokkos.