Stanford HAIAnt GroupBJUTHKUHKUSTHunanInstitute of Science TokyoOxfordUNCXJTUZJUJun 1, 2026arXiv:2606.02302

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu, He Liu, Erjia Xiao, Junchi Chen, Xiaoyu Shi, Yichi Wang, Jing Yang, Taowen Wang, Jinhao Duan, Mengshu Sun, Peiyan Dong, Xuan Shen, Yang Cao, Renjing Xu, Jindong Gu, Bo Zhang, Jize Zhang, Chenhao Lin, Philip Torr, Chao Shen

AI Summary

This paper introduces SeClaw, a novel framework that synthesizes security tasks for evaluating autonomous LLM agents in stateful environments, addressing the inadequacies of existing benchmarks that focus on final outcomes rather than execution processes. By leveraging specification-driven task synthesis, SeClaw enables the scalable creation of security tasks derived from structured risk specifications, while also providing a standardized testbed for assessing agent behavior in various safety-risk scenarios. The key finding reveals that SeClaw effectively enhances the coverage of emerging threats and allows for trajectory-aware evaluations of unsafe actions, thereby improving the diagnosis and comparison of security failures in autonomous agents.

Key Contribution

SeClaw reveals that existing benchmarks fall short in capturing the complexities of agent behavior, enabling a more nuanced evaluation of security risks in autonomous systems.

Abstract

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe behavior. We introduce SeClaw, a framework that combines specification-driven security task synthesis with execution-based security evaluation for Autonomous agents. Spec-driven security task synthesis enables scalable and controllable construction of security tasks from structured risk specifications, while SeClaw docker provides a standardized testbed for evaluating agent behavior under diverse safety-risk scenarios. The benchmark covers risks arising from resources, user tasks, environments, and intrinsic agent behaviors, and supports trajectory-aware assessment of unsafe actions beyond final responses. By bridging systematic task synthesis and reproducible security evaluation, SeClaw provides a practical foundation for measuring, diagnosing, and comparing security failures in autonomous LLM agents. The code is available at https://github.com/seclaw-eval/seclaw-eval.

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Related Papers