HuaweiSteveUIUCApr 2, 2026arXiv:2604.01687

EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Hanrong Zhang, Shichen Fan, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayuan Zhou, Jiayu Zhou, Chengze Li, Chengze Li, Wei-Chieh Huang, Yifei Yao, Yifei Yao, Kening Zheng, Xue Liu, Xiaoxiao Li, Philip S. Yu

AI Summary

EvoSkills is introduced as a framework for autonomous generation of complex, multi-file skills for LLM agents, addressing the limitations of manual skill authoring and human-machine cognitive misalignment. It couples a Skill Generator, which iteratively refines skills, with a co-evolving Surrogate Verifier that provides feedback without relying on ground-truth test content. Experiments on SkillsBench demonstrate that EvoSkills achieves state-of-the-art performance on Claude Code and Codex, and generalizes well to other LLMs.

Key Contribution

LLM agents can now autonomously generate complex skills with multi-file dependencies, rivaling human-authored skills, thanks to a co-evolutionary verification process that doesn't need ground truth labels.

Abstract

Anthropic proposes the concept of skills for LLM agents to tackle multi-step professional tasks that simple tool invocations cannot address. A tool is a single, self-contained function, whereas a skill is a structured bundle of interdependent multi-file artifacts. Currently, skill generation is not only label-intensive due to manual authoring, but also may suffer from human--machine cognitive misalignment, which can lead to degraded agent performance, as evidenced by evaluations on SkillsBench. Therefore, we aim to enable agents to autonomously generate skills. However, existing self-evolving methods designed for tools cannot be directly applied to skills due to their increased complexity. To address these issues, we propose EvoSkills, a self-evolving skills framework that enables agents to autonomously construct complex, multi-file skill packages. Specifically, EvoSkills couples a Skill Generator that iteratively refines skills with a Surrogate Verifier that co-evolves to provide informative and actionable feedback without access to ground-truth test content. On SkillsBench, EvoSkills achieves the highest pass rate among five baselines on both Claude Code and Codex, and also exhibits strong generalization capabilities to six additional LLMs.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Related Papers