Apr 19, 2026arXiv:2604.17517

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

AI Summary

This paper reveals a critical limitation in enforcement-based governance of autonomous agent systems, demonstrating that such systems can fail to detect behavioral drift due to a mismatch between local enforcement signals and global behavioral properties. The authors introduce the Non-Identifiability Theorem, which establishes that the admissible behavior space (A0) cannot be inferred from enforcement signals under the Local Observability Assumption, a condition met by all practical systems. By proposing the Invariant Measurement Layer (IML), they provide a method that successfully detects drift within the enforcement blind spots, achieving detection in as few as 9 steps across various scenarios.

Key Contribution

Enforcement mechanisms in agent systems can miss significant behavioral drift, but the Invariant Measurement Layer can detect these deviations in real-time, revealing a hidden vulnerability in current governance approaches.

Abstract

Autonomous agent systems are governed by enforcement mechanisms that flag hard constraint violations at runtime. The Agent Control Protocol identifies a structural limit of such systems: a correctly-functioning enforcement engine can enter a regime in which behavioral drift is invisible to it, because the enforcement signal operates below the layer where deviation is measurable. We show that enforcement-based governance is structurally unable to determine whether an agent's behavior remains within the admissible behavior space A0 established at admission time. Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise rule set, while A0 encodes global, trajectory-level behavioral properties set at admission time. We define the Invariant Measurement Layer (IML), which bypasses this limitation by retaining direct access to the generative model of A0. We prove an information-theoretic impossibility for enforcement-based monitoring; separately, we show IML detects admission-time drift with provably finite detection delay, operating in the region where enforcement is structurally blind. Validated across four settings: three drift scenarios (300 and 1000 steps), a live n8n webhook pipeline, and a LangGraph StateGraph agent -- enforcement triggers zero violations while IML detects each drift type within 9-258 steps. Paper 2 of a 4-paper Agent Governance Series: atomic boundaries (P0, 10.5281/zenodo.19642166), ACP enforcement (P1, arXiv:2603.18829), fair allocation (P3, 10.5281/zenodo.19643928), irreducibility (P4, 10.5281/zenodo.19643950).

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

Related Papers