Queen'sApr 22, 2026arXiv:2604.20553

DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks

AI Summary

DeepParse is a hybrid log parsing framework that leverages LLMs to synthesize regex masks from small log samples, which are then used by the Drain algorithm for efficient and accurate log structuring. This approach separates the reasoning phase (LLM regex synthesis) from the execution phase (Drain parsing), improving scalability and cost-efficiency. Experiments on 16 datasets show DeepParse achieves 97.6% parsing accuracy, outperforming heuristic and LLM-only baselines, and reduces false alarms in anomaly detection by 30% with a 36% latency reduction.

Key Contribution

LLMs can bootstrap accurate and efficient log parsing by synthesizing regex masks, enabling a hybrid approach that outperforms both heuristic and LLM-only methods.

Abstract

Modern distributed systems produce massive, heterogeneous logs essential for reliability, security, and anomaly detection. Converting these free-form messages into structured templates (log parsing) is challenging due to evolving formats and limited labeled data. Machine-learning-based parsers like Drain are fast but accuracy often degrades on complex variables, while Large Language Models (LLMs) offer better generalization but incur prohibitive inference costs. This paper presents DeepParse, a hybrid framework that automatically mines reusable variable patterns from small log samples using an LLM, then applies them deterministically through the Drain algorithm. By separating the reasoning phase from execution, DeepParse enables accurate, scalable, and cost-efficient log structuring without relying on brittle handcrafted rules or per-line neural inference. Across 16 benchmark datasets, DeepParse achieves higher accuracy in variable extraction (97.6% average Parsing Accuracy) and better consistency than both heuristic and LLM-only baselines. Integrating DeepParse into an anomaly detection pipeline reduced false alarms by over 30% and reduced inference latency by 36% compared to heuristic baselines.

Distributed Systems & Hardware Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks

Related Papers