Search papers, labs, and topics across Lattice.
This paper details a major refactoring of MPICH to support true MPI Sessions, decoupling it from the global MPI_COMM_WORLD communicator. The work addresses scalability bottlenecks inherent in traditional MPI implementations on exascale systems. Benchmarks demonstrate that the new implementation, leveraging explicit hierarchical designs, achieves significant scalability improvements compared to the prior, world-communicator-dependent approach.
Ditching the global MPI_COMM_WORLD communicator unlocks significant scalability gains for MPI applications on exascale systems.
Sessions is one of the major features introduced in the MPI-4 standard. It offers an alternative to the traditional world communicator model by allowing applications to construct communicators from process sets, thereby eliminating the dependency on MPI_COMM_WORLD. The Sessions model was proposed as a more scalable solution for exascale systems, where MPI_COMM_WORLD was viewed as a potential scalability bottleneck. However, supporting Sessions is a significant challenge for established codebases like MPICH due to the deep integration of the world model in traditional MPI implementations. Although MPICH added support for the MPI-4 standard upon its release, it still internally relied on a global world communicator. This approach enabled applications written using the Sessions model to function, but it did not fulfill the full design intent of Sessions, which meant to decouple MPI from MPI_COMM_WORLD. We describe MPICH effort to support true MPI Sessions, including a major internal refactoring. We describe the architectural changes required to support true Sessions and evaluate the resulting implementation scalability. Our results demonstrate that true Sessions can offer significant scalability benefits by adopting explicit hierarchical designs.