Search papers, labs, and topics across Lattice.
Uni-Skill is introduced as a unified skill-centric framework for robotic manipulation that integrates skill-aware planning with automatic skill evolution. It addresses the limitations of fixed skill libraries by requesting new skill implementations when existing ones are inadequate, thus enabling adaptable planning. The framework leverages SkillFolder, a VerbNet-inspired repository derived from large-scale robotic videos, to provide semantic supervision and fine-grained references for few-shot skill inference, achieving state-of-the-art performance in simulation and real-world experiments.
Forget manually annotating robot skills: Uni-Skill automatically evolves a skill library from unstructured video, enabling zero-shot generalization to novel manipulation tasks.
While skill-centric approaches leverage foundation models to enhance generalization in compositional tasks, they often rely on fixed skill libraries, limiting adaptability to new tasks without manual intervention. To address this, we propose Uni-Skill, a Unified Skill-centric framework that supports skill-aware planning and facilitates automatic skill evolution. Unlike prior methods that restrict planning to predefined skills, Uni-Skill requests for new skill implementations when existing ones are insufficient, ensuring adaptable planning with self-augmented skill library. To support automatic implementation of diverse skills requested by the planning module, we construct SkillFolder, a VerbNet-inspired repository derived from large-scale unstructured robotic videos. SkillFolder introduces a hierarchical skill taxonomy that captures diverse skill descriptions at multiple levels of abstraction. By populating this taxonomy with large-scale, automatically annotated demonstrations, Uni-Skill shifts the paradigm of skill acquisition from inefficient manual annotation to efficient offline structural retrieval. Retrieved examples provide semantic supervision over behavior patterns and fine-grained references for spatial trajectories, enabling few-shot skill inference without deployment-time demonstrations. Comprehensive experiments in both simulation and real-world settings verify the state-of-the-art performance of Uni-Skill over existing VLM-based skill-centric approaches, highlighting its advanced reasoning capabilities and strong zero-shot generalization across a wide range of novel tasks.