Search papers, labs, and topics across Lattice.
The paper introduces CoViLLM, a framework that uses LLMs to enable robots to collaboratively assemble customized and novel products with humans. CoViLLM integrates depth-camera-based localization, human operator classification for identifying new components, and an LLM for assembly task planning from natural language instructions. Experiments on the NIST Assembly Task Board demonstrate the framework's ability to extend HRC beyond predefined product and task settings.
Robots can now assemble novel products alongside humans, guided by natural language, thanks to a new LLM-powered framework.
With increasing demand for mass customization, traditional manufacturing robots that rely on rule-based operations lack the flexibility to accommodate customized or new product variants. Human-Robot Collaboration (HRC) has demonstrated potential to improve system adaptability by leveraging human versatility and decision-making capabilities. However, existing HRC frame- works typically depend on predefined perception-manipulation pipelines, limiting their ability to autonomously generate task plans for new product assembly. In this work, we propose CoViLLM, an adaptive human-robot collaborative assembly frame- work that supports the assembly of customized and previously unseen products. CoViLLM combines depth-camera-based localization for object position estimation, human operator classification for identifying new components, and an Large Language Model (LLM) for assembly task planning based on natural language instructions. The framework is validated on the NIST Assembly Task Board for known, customized, and new product cases. Experimental results show that the proposed framework enables flexible collaborative assembly by extending HRC beyond predefined product and task settings.