Search papers, labs, and topics across Lattice.
This paper investigates symbolic guardrails as a means to provide strong safety and security guarantees for AI agents interacting with environments through tools, particularly in high-stakes business settings. They systematically reviewed 80 agent safety and security benchmarks, analyzed the enforceability of policy requirements via symbolic guardrails, and evaluated the impact of these guardrails on safety, security, and agent success across three benchmarks. The study found that symbolic guardrails can enforce 74% of specified policy requirements, improving safety and security without compromising agent utility.
Forget training wheels: symbolic guardrails offer a surprisingly simple and effective way to guarantee safety and security for AI agents in critical domains.
AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $\tau^2$-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.