CMU MLApr 16, 2026arXiv:2604.15579

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang, C. Timperley, Christian Kastner

AI Summary

This paper investigates symbolic guardrails as a means to provide strong safety and security guarantees for AI agents interacting with environments through tools, particularly in high-stakes business settings. They systematically reviewed 80 agent safety and security benchmarks, analyzed the enforceability of policy requirements via symbolic guardrails, and evaluated the impact of these guardrails on safety, security, and agent success across three benchmarks. The study found that symbolic guardrails can enforce 74% of specified policy requirements, improving safety and security without compromising agent utility.

Key Contribution

Forget training wheels: symbolic guardrails offer a surprisingly simple and effective way to guarantee safety and security for AI agents in critical domains.

Abstract

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $\tau^2$-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References72

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Related Papers