Search papers, labs, and topics across Lattice.
The paper introduces Pen-Strategist, a framework designed to improve automated penetration testing by enhancing strategy formulation and action selection. They fine-tuned a Qwen-3-14B model using reinforcement learning on a novel dataset of logical explanations for pentesting strategies, achieving an 87% improvement in strategy derivation over baselines. Integrating Pen-Strategist into existing frameworks like PentestGPT resulted in a 47.5% increase in subtask completion and outperformed GPT-5, demonstrating its effectiveness in real-world pentesting scenarios.
LLMs can now formulate significantly better penetration testing strategies, outperforming even GPT-5, thanks to a novel reasoning framework and targeted fine-tuning.
Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.