Search papers, labs, and topics across Lattice.
This paper surveys existing Operational Data Analytics (ODA) frameworks used in large-scale computing infrastructures like HPC systems, highlighting their capabilities and limitations. It then proposes a more comprehensive ODA framework tailored to large-scale graph-processing distributed ecosystems, building upon the work of Netti et al. The proposed framework aims to improve operational efficiencies, and the paper compares it to state-of-the-art frameworks to demonstrate its novelty and encourage further research in the field.
Existing operational data analytics frameworks leave significant gaps when applied to the complexities of modern, large-scale graph processing ecosystems, motivating a new holistic approach.
By 2025, there are zettabytes of data generated every year. The size and complexity of modern large-scale computing infrastructures like High-Performance Computing (HPC) systems continue to evolve and become complex, leaving us wondering about their manageability and sustainability concerns. Because of this reason, those complex systems are provided with fine-grained monitoring and Operational Data Analytics (ODA) capabilities to optimise their efficiency. In this literature study, we list the fundamental pillars of the large-scale computing infrastructures which enable its ODA capabilities, and conduct a study of the popular ODA frameworks operating in various such environments (predominantly HPC). Based on that, we propose a more holistic ODA framework matching the various layers of a large-scale graph-processing distributed ecosystem proposed by Sherif Sak et al, that extends the ODA functionalities presented in an existing novel ODA framework proposed by Netti et al. We compare the holistic ODA framework proposed by us to some of the state-of-the-art frameworks that we study as part of this literature to highlight the novelty, which would hopefully draw more attention to perform extensive research in this field. As part of creating awareness, we highlight the significant operational efficiencies observed as a result of the implementation of the state-of-the-art ODA frameworks to make the study appear beneficial for the readers, and lastly, discuss the trending research work ongoing in this field.