Search papers, labs, and topics across Lattice.
This paper introduces an accelerated infrastructure for the hpcanalysis framework, designed to handle the massive telemetry data generated by exascale systems. By implementing a high-performance C++ API and GPU parallelism, the framework achieves significant speedups in data ingestion and analysis, reaching 9.69 seconds for 100,000 MPI ranks and up to 314x speedup in trace analysis. The framework also incorporates topology-aware workflows to map performance outliers to physical interconnect coordinates, and a novel tri-dimensional performance model to identify potential speedups in scientific workloads, demonstrating a 32.28% potential speedup for GAMESS on Frontier.
Analyzing exascale performance bottlenecks just got hundreds of times faster, thanks to a new GPU-accelerated framework that pinpoints congestion and predicts optimization opportunities in scientific workloads.
As exascale systems reach unprecedented concurrency, traditional performance analysis tools struggle with the overhead of massive-scale telemetry. We present an accelerated infrastructure for the hpcanalysis framework that leverages a high-performance C++ API and GPU parallelism to enable high-throughput diagnostics. Our C++ API achieves a 9.69-second ingestion time for 100,000 MPI ranks on Aurora. Furthermore, our GPU-accelerated layer achieves up to 314x speedup over CPU-based processing when analyzing 100,000 execution traces. Finally, we implement a topology-aware workflow that maps logical performance outliers to physical Slingshot interconnect coordinates, localizing network congestion across 22 distinct racks on Aurora. We also demonstrate how the framework's advanced interface seamlessly integrates with external tools to provide sophisticated analytical models. We introduce a novel tri-dimensional performance model that"re-materializes"iterative behavior from execution traces; using this model, we identified a 32.28% potential speedup for a GAMESS workload on Frontier.