FudanShanghai InnovationJun 11, 2026arXiv:2606.13079

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

Jiaqi Luo, Jiarun Dai, Zhile Chen, Jia Xu, Weibing Wang, Yawen Duan, Brian Tse, Geng Hong, Xu Pan, Xudong Pan, Yuan Zhang, Min Yang

AI Summary

This study evaluates the autonomous penetration capabilities of large language model (LLM)-powered AI systems using a novel evaluation framework that includes diverse target servers and a general-purpose agent architecture. By assessing 19 LLMs in realistic scenarios, the research reveals penetration success rates between 10.7% and 69.3%, highlighting a correlation between model capability and penetration effectiveness. These findings underscore the potential risks posed by advanced AI systems in executing cyberattacks without human oversight, emphasizing the need for robust safety measures.

Key Contribution

Current LLMs can autonomously penetrate systems with success rates up to 69.3%, revealing alarming implications for cybersecurity.

Abstract

Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

Related Papers