Shanghai AI LabSJTUJun 10, 2026arXiv:2606.11543

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Zhiyu Chen, Zihan Guo, Bo Huang, Bingwei Lu, Jianghao Lin, Yuanjian Zhou, Weinan Zhang

AI Summary

This study introduces SkillJuror, a framework that evaluates how the organization of procedural knowledge (Skills) affects the runtime behavior of large language model agents. By employing a Progressive Disclosure approach, the authors demonstrate that this method significantly increases the number of distinct Skill resources utilized during task execution and improves effective uptake events, leading to a 4.1% increase in successful trials compared to a normalized flat baseline. The findings reveal that the organization of Skills can fundamentally alter agent behavior and performance, particularly when supporting resources are actionable for specific tasks.

Key Contribution

Skill organization can dramatically enhance agent performance, with a 4.1% increase in successful outcomes when using Progressive Disclosure over traditional methods.

Abstract

Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclosure, where a concise root file points agents to supporting resources on demand, and compare it with a normalized flat baseline. We present SkillJuror, a framework for evaluating Skill writing paradigms through semantically controlled variants, matched multi-trial evaluations, and trajectory evidence while holding task knowledge fixed. In an 82-task SkillsBench study, Progressive Disclosure changes runtime behavior before aggregate outcomes: distinct Skill resources touched per trajectory rise from 1.18 to 3.85, and effective uptake events rise from 1.33 to 3.92. It also yields 17 additional verifier-passing trials out of 410 matched trials (+4.1%) over the normalized flat baseline. The benefit is task-dependent. Progressive Disclosure helps when supporting resources guide implementation, checking, or repair, but is weaker when success hinges on exact output conventions, numerical thresholds, or long artifact-generation pipelines. These results show that Skill organization is not mere presentation: it can change how agents search and apply procedural knowledge, while outcome gains depend on whether the exposed resources are actionable for the task. Code is available at https://github.com/zhiyuchen-ai/skill-juror.

Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Related Papers