Search papers, labs, and topics across Lattice.
The paper introduces CAM-LDS, a new labeled dataset of system logs and security alerts covering 81 attack techniques across 13 tactics, designed to facilitate research on LLM-based log analysis. The dataset was collected from 18 distinct sources within a fully open-source and reproducible test environment. An illustrative case study using an LLM to process CAM-LDS shows perfect attack technique prediction for approximately one third of attack steps and adequate prediction for another third, demonstrating the potential of LLMs for log interpretation.
A new dataset of labeled system logs, CAM-LDS, enables training and evaluation of LLMs for automated, domain-agnostic cyberattack analysis.
Log data are essential for intrusion detection and forensic investigations. However, manual log analysis is tedious due to high data volumes, heterogeneous event formats, and unstructured messages. Even though many automated methods for log analysis exist, they usually still rely on domain-specific configurations such as expert-defined detection rules, handcrafted log parsers, or manual feature-engineering. Crucially, the level of automation of conventional methods is limited due to their inability to semantically understand logs and explain their underlying causes. In contrast, Large Language Models enable domain- and format-agnostic interpretation of system logs and security alerts. Unfortunately, research on this topic remains challenging, because publicly available and labeled data sets covering a broad range of attack techniques are scarce. To address this gap, we introduce the Cyber Attack Manifestation Log Data Set (CAM-LDS), comprising seven attack scenarios that cover 81 distinct techniques across 13 tactics and collected from 18 distinct sources within a fully open-source and reproducible test environment. We extract log events that directly result from attack executions to facilitate analysis of manifestations concerning command observability, event frequencies, performance metrics, and intrusion detection alerts. We further present an illustrative case study utilizing an LLM to process the CAM-LDS. The results indicate that correct attack techniques are predicted perfectly for approximately one third of attack steps and adequately for another third, highlighting the potential of LLM-based log interpretation and utility of our data set.