RIKENFeb 17, 2026arXiv:2602.15995

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi-threaded Programs

Xiang Fu, Shiman Meng, Weiping Zhang, Luanzheng Guo, Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee

AI Summary

This paper introduces two distributed order recording techniques, Distributed Clock (DC) and Distributed Epoch (DE), to improve the efficiency of record-and-replay for multi-threaded OpenMP programs. These techniques reduce thread synchronization overhead compared to traditional approaches that synchronize on every shared-memory access. Experiments using ReOMP on HPC applications demonstrate a 2-5x performance improvement, and integration with ReMPI enables replay of MPI+OpenMP applications with minimal overhead.

Key Contribution

OpenMP debugging gets a major speed boost: new distributed recording slashes record-and-replay overhead by 2-5x, finally making it practical for large-scale HPC apps.

Abstract

After all these years and all these other shared memory programming frameworks, OpenMP is still the most popular one. However, its greater levels of non-deterministic execution makes debugging and testing more challenging. The ability to record and deterministically replay the program execution is key to address this challenge. However, scalably replaying OpenMP programs is still an unresolved problem. In this paper, we propose two novel techniques that use Distributed Clock (DC) and Distributed Epoch (DE) recording schemes to eliminate excessive thread synchronization for OpenMP record and replay. Our evaluation on representative HPC applications with ReOMP, which we used to realize DC and DE recording, shows that our approach is 2-5x more efficient than traditional approaches that synchronize on every shared-memory access. Furthermore, we demonstrate that our approach can be easily combined with MPI-level replay tools to replay non-trivial MPI+OpenMP applications. We achieve this by integrating \toolname into ReMPI, an existing scalable MPI record-and-replay tool, with only a small MPI-scale-independent runtime overhead.

Code Generation & Program Synthesis Distributed Systems & Hardware

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi-threaded Programs

Related Papers