Search papers, labs, and topics across Lattice.
This paper introduces a temporal graph neural network model for fault discrimination in microservice architectures, addressing the challenges of diverse fault types, complex dependencies, and dynamic operational states. The model jointly learns temporal dynamics of service states using a temporal coding module and structural interactions via attention-based message passing. Experiments demonstrate that this joint modeling approach outperforms existing methods in fault discrimination across multiple evaluation metrics.
Untangling the chaotic web of microservice failures just got easier: a new model uses temporal graph neural networks to pinpoint faults by jointly learning how services evolve and interact.
Addressing the diverse fault morphologies, complex dependencies, and time-varying operational states in microservice distributed systems, this paper proposes a distributed fault discrimination model based on temporal graph neural networks. This model characterizes the microservice operation process as a dynamic graph sequence evolving, and performs joint representation learning of temporal modeling and structural interactions within a unified framework. First, service-level multi-source observation signals are aligned and characterized to construct node feature sequences and their corresponding time-dependent dependencies. Then, a temporal coding module is introduced to extract the dynamic evolution representation of service states, and at each time step, attention-based structured message passing is used to characterize dependency interactions and propagation associations, forming a structure-enhanced temporal node representation. Furthermore, a dual readout mechanism is employed to aggregate the node and temporal dimensions, obtaining a system-level global representation and outputting the fault category distribution. Finally, supervised learning objectives are used to optimize model parameters, enabling the model to learn stable discrimination evidence under complex interactions and multi-source noise conditions. Comparative experimental results show that the proposed method achieves superior performance on multiple evaluation metrics, validating the effectiveness of jointly modeling temporal evolution and dependency structures in improving the distributed fault discrimination capability of microservices.