Search papers, labs, and topics across Lattice.
SkillVLA is introduced to address the challenge of combinatorial diversity in bimanual vision-language-action (VLA) models by enabling skill reuse. The framework disentangles skills across arms, allowing for the recombination of previously learned single-arm skills in novel left-right pairings. Experiments demonstrate that SkillVLA significantly improves skill composition, increasing success rates from 0% to 51% and achieving strong performance on cooperative and long-horizon tasks.
Unlock combinatorial generalization in dual-arm robots by disentangling single-arm skills, enabling reuse and boosting success rates from 0% to 51%.
Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.