Search papers, labs, and topics across Lattice.
EvoSkills is introduced as a framework for autonomous generation of complex, multi-file skills for LLM agents, addressing the limitations of manual skill authoring and human-machine cognitive misalignment. It couples a Skill Generator, which iteratively refines skills, with a co-evolving Surrogate Verifier that provides feedback without relying on ground-truth test content. Experiments on SkillsBench demonstrate that EvoSkills achieves state-of-the-art performance on Claude Code and Codex, and generalizes well to other LLMs.
LLM agents can now autonomously generate complex skills with multi-file dependencies, rivaling human-authored skills, thanks to a co-evolutionary verification process that doesn't need ground truth labels.
Anthropic proposes the concept of skills for LLM agents to tackle multi-step professional tasks that simple tool invocations cannot address. A tool is a single, self-contained function, whereas a skill is a structured bundle of interdependent multi-file artifacts. Currently, skill generation is not only label-intensive due to manual authoring, but also may suffer from human--machine cognitive misalignment, which can lead to degraded agent performance, as evidenced by evaluations on SkillsBench. Therefore, we aim to enable agents to autonomously generate skills. However, existing self-evolving methods designed for tools cannot be directly applied to skills due to their increased complexity. To address these issues, we propose EvoSkills, a self-evolving skills framework that enables agents to autonomously construct complex, multi-file skill packages. Specifically, EvoSkills couples a Skill Generator that iteratively refines skills with a Surrogate Verifier that co-evolves to provide informative and actionable feedback without access to ground-truth test content. On SkillsBench, EvoSkills achieves the highest pass rate among five baselines on both Claude Code and Codex, and also exhibits strong generalization capabilities to six additional LLMs.