Tsinghua AICASNYUProject LeadsApr 15, 2026arXiv:2604.13630

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao, Li Guo, Li Guo

AI Summary

SafeHarness is a novel security architecture that integrates four defense layers—adversarial context filtering, tiered causal verification, privilege-separated tool control, and safe rollback with adaptive degradation—directly into the LLM agent lifecycle. This approach addresses the limitations of existing security measures by coordinating defenses across different phases of agent operation and monitoring harness-internal state. Evaluations on benchmark datasets demonstrate that SafeHarness reduces unsafe behavior rate by 38% and attack success rate by 42% compared to unprotected baselines, while maintaining task utility.

Key Contribution

LLM agent harnesses are surprisingly vulnerable, but weaving security directly into the agent lifecycle can slash attack success by 42% without sacrificing utility.

Abstract

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References47

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Related Papers