Search papers, labs, and topics across Lattice.
This paper introduces SubstratumGraphEnv, a novel Gymnasium-compatible reinforcement learning environment for modeling system attack paths by simulating sequences of processes executed on a Windows OS using Sysmon logs. The environment represents OS state and transitions as a graph, capturing parent-child process relationships and system event variety. The environment is paired with a PyTorch interface (SubstratumBridge) and an Advantage Actor-Critic (A2C) model using Graph Convolutional Networks (GCNs) to process the graph-based observations.
Automating attack path discovery just got easier: a new reinforcement learning environment translates raw system logs into graph-based states for training intelligent agents.
Automating network security analysis, particularly the identification of potential attack paths, presents significant challenges. Due in part to the sequential, interconnected, and evolutionary nature of system events which most artificial intelligence (AI) techniques struggle to model effectively. This paper proposes a Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system. This methodology models operating system state and transitions using a graph representation. This graph is derived from open-source System Monitor (Sysmon) logs. To address the variety in system event types, fields, and log formats, a mechanism was developed to capture and model parent-child processes from Sysmon logs. A Gymnasium environment (SubstratumGraphEnv) was constructed to establish the perceptible basis for an RL environment, and a customized PyTorch interface was also built (SubstratumBridge) to translate Gymnasium graphs into Deep Reinforcement Learning (DRL) observations and discrete actions. Graph Convolutional Networks (GCNs) concretize the graph's local and global state, which feed the distinct policy and critic heads of an Advantage Actor-Critic (A2C) model. This work's central contribution lies in the design of a novel deep graphical RL environment that automates translation of sequential user and system events, furnishing crucial context for cybersecurity analysis. This work provides a foundation for future research into shaping training parameters and advanced reward shaping, while also offering insight into which system events attributes are critical to training autonomous RL agents.