CASApr 28, 2026arXiv:2604.25109

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Lijia Lv, Xuehai Tang, Jie Wen, Jizhong Han, Songlin Hu

AI Summary

This paper introduces SkillGuard-Robust, a system for pre-load security auditing of Agent Skills, which are reusable capability units comprising code, documentation, and context. SkillGuard-Robust enhances robustness against semantics-preserving rewrites by combining role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. Evaluated on SkillGuardBench and public-ecosystem extensions, SkillGuard-Robust achieves high accuracy, recall, and consistency in identifying malicious risks across various evaluation views, demonstrating improved robustness in package auditing.

Key Contribution

Pre-load auditing of Agent Skills can achieve >97% accuracy in detecting malicious intent, even against semantics-preserving rewrites, by combining role-aware evidence extraction with semantic verification.

Abstract

Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact consistency. On the 254-package external-ecosystem view, it reaches 99.66%, 100.00%, and 100.00%, respectively. These results support a bounded conclusion: factorized package auditing materially improves frozen and public-ecosystem robustness, while harsher external-source transfer remains an open challenge.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Related Papers