Mar 15, 2026arXiv:2603.14332

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

AI Summary

This paper addresses the "capability-identity gap" in AI agents that dynamically acquire capabilities at runtime, proposing a framework for detecting unauthorized capability changes post-authorization. The framework uses capability-bound agent certificates, reproducibility commitments based on LLM inference near-determinism, and a verifiable interaction ledger. Experiments demonstrate the framework's ability to detect various attack scenarios with low overhead, outperforming a MCP+OAuth 2.1 baseline.

Key Contribution

Stop silent capability escalation: this framework uses cryptographic binding and reproducibility verification to ensure AI agents only do what they're authorized to do.

Abstract

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, θ=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.

Constitutional AI & AI Ethics Open-Source Models & Weights Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Related Papers