Mar 18, 2026arXiv:2603.18059

Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows

AI Summary

This paper introduces Policy-First Tooling, a model-agnostic permission layer for tool-using automation systems that enforces explicit constraints and risk-aware gating. They define a compact policy DSL and runtime architecture to mediate tool invocation, providing actionable rationale and fix hints. Experiments across various policy packs and fault profiles demonstrate that stricter policies improve violation prevention and reduce retry amplification and leakage, but at the cost of task success, making safety-utility tradeoffs explicit.

Key Contribution

Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.

Abstract

Tool-using automation systems, from scripts and CI bots to agentic assistants, fail in recurring patterns. Common failures include unsafe side effects, invalid arguments, uncontrolled retries, and leakage of sensitive outputs. Many mitigations are model-centric and prompt-dependent, so they are brittle and do not generalize to non-LLM callers. We present Policy-First Tooling, a model-agnostic permission layer that mediates tool invocation through explicit constraints, risk-aware gating, recovery controls, and auditable explanations. The paper contributes a compact policy DSL, a runtime enforcement architecture with actionable rationale and fix hints, and a reproducible benchmark based on trace replay with controlled fault and misuse injection. In 225 controlled runs across five policy packs and three fault profiles, stricter packs improve violation prevention from 0.000 in P0 to 0.681 in P4, while task success drops from 0.356 to 0.067. Retry amplification decreases from 3.774 in P0 to 1.378 in P4, and leakage recall reaches 0.875 under injected secret outputs. These results make safety to utility trade-offs explicit and measurable.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows

Related Papers