Search papers, labs, and topics across Lattice.
This paper presents a systematic study of model-generated skills, examining their full lifecycle from experience generation to skill consumption across five diverse agentic task domains. The authors find that while model-generated skills are generally beneficial, they exhibit significant negative transfer and that extractor/consumer performance varies independently of model scale or baseline task strength. To mitigate these issues, they develop a meta-skill that guides skill extraction based on features tied to utility, leading to improved skill quality and reduced negative transfer.
Model-generated skills can actually hurt agent performance, and bigger models don't necessarily make for better skill extractors or consumers.
Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-generated skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and they scale beyond labor-intensive hand-crafting. However, while extraction methods continue to proliferate, understanding remains limited, with no comprehensive study spanning the full skill lifecycle -- experience generation, skill extraction, and skill consumption -- to ask whether such skills actually work, when they work, and what makes them succeed or fail. To close this gap, we build a utility-grounded evaluation framework that provides systematic experimental results across extractors and target agents, covering five diverse agentic task domains. We find that model-generated skills are beneficial on average but exhibit non-trivial negative transfer, and that neither extractors nor targets behave uniformly. A model can be a strong extractor yet a weak consumer, or vice versa, with skill utility independent of model scale or baseline task strength. To explain these patterns, we then dissect each lifecycle stage in depth, analyzing how experience composition shapes skill quality, what properties characterize useful skills, and how the same skill transfers across different consumers. Finally, we translate these findings into a concrete meta-skill that guides skill extraction toward the features tied to actual utility, which consistently improves skill quality across domains and substantially reduces negative transfer.