Search papers, labs, and topics across Lattice.
The paper introduces ucTrace, a novel profiling tool designed to expose and visualize UCX-driven communication in HPC environments, addressing the limitations of existing tools that lack fine-grained UCX-level traces or transport-layer behavior capture. ucTrace profiles message passing at the UCX level, linking operations between hosts and devices directly to their originating MPI functions, enabling insights into MPI workflows. The tool's capabilities are demonstrated through experiments analyzing MPI point-to-point behavior, Allreduce comparisons, linear solver communication, NUMA binding effects, and GROMACS MD simulations.
See inside the black box of UCX-driven communication with ucTrace, a new profiler that exposes the transport layer behavior other tools miss.
UCX is a communication framework that enables low-latency, high-bandwidth communication in HPC systems. With its unified API, UCX facilitates efficient data transfers across multi-node CPU-GPU clusters. UCX is widely used as the transport layer for MPI, particularly in GPU-aware implementations. However, existing profiling tools lack fine-grained communication traces at the UCX level, do not capture transport-layer behavior, or are limited to specific MPI implementations. To address these gaps, we introduce ucTrace, a novel profiler that exposes and visualizes UCX-driven communication in HPC environments. ucTrace provides insights into MPI workflows by profiling message passing at the UCX level, linking operations between hosts and devices (e.g., GPUs and NICs) directly to their originating MPI functions. Through interactive visualizations of process- and device-specific interactions, ucTrace helps system administrators, library and application developers optimize performance and debug communication patterns in large-scale workloads. We demonstrate ucTrace's features through a wide range of experiments including MPI point-to-point behavior under different UCX settings, Allreduce comparisons across MPI libraries, communication analysis of a linear solver, NUMA binding effects, and profiling of GROMACS MD simulations with GPU acceleration at scale. ucTrace is publicly available at https://github.com/ParCoreLab/ucTrace.