Search papers, labs, and topics across Lattice.
ChainCaps introduces a runtime rule for tool-using agents that mitigates "permission laundering," where agents bypass safety checks through tool composition. This rule enforces monotonic capability attenuation by assigning each value a sink-specific capability budget that is intersected during tool composition, preventing the escalation of privileges. Implemented as a transparent proxy, ChainCaps significantly reduces attack success rates on a suite of tasks across various frontier models while preserving benign completion rates, demonstrating its effectiveness in enhancing the safety of tool-using agents.
Tool-using agents can be tricked into leaking sensitive data even when each individual tool use seems safe – ChainCaps stops this "permission laundering" with a simple, effective runtime check.
Tool-using agents increasingly operate in open-ended deployment environments, where they compose file systems, web APIs, code interpreters, and enterprise services at runtime. This creates a safety gap in tool composition: an agent can satisfy every per-tool permission check and still produce an unsafe end-to-end effect, such as reading a confidential document, summarizing it, and sending the summary to an external endpoint. We call this failure mode permission laundering. ChainCaps addresses it with a runtime rule: every value carries a sink-specific capability budget, and tool composition propagates budgets by intersection. A value can preserve or lose authority as it moves through a tool chain, but it cannot gain new authority through composition. We implement ChainCaps as a transparent MCP proxy that requires no changes to the agent or tool servers. On 82 tasks across five frontier models from three providers, ChainCaps reduces attack success rate from 25-68% to 0-4.8% while preserving 96-100% benign completion. In replay experiments, it also outperforms scalar-IFC and per-function-isolation baselines. Manifest quality is the dominant deployment bottleneck: expert manifests reach 100% attack blocking, while naive manifests fall to 27.3%. Our claims are limited to explicit-flow composition safety under trusted manifests and proxy-visible data movement, a practical gap in deployed tool-using agents today.