Search papers, labs, and topics across Lattice.
This paper introduces Y-BotFrame, an extensible framework for quadruped robots that enhances their capabilities as intelligent ground assistants through multimodal perception and a large language model. By integrating speech, vision, and LiDAR, Y-BotFrame allows for natural language processing of user commands, enabling robots to execute tasks autonomously without the need for remote control. The framework's modular design facilitates easy upgrades and the integration of new functionalities, making it a significant advancement in the deployment of instruction-driven embodied agents in real-world scenarios.
Y-BotFrame transforms quadruped robots into intelligent assistants that can understand and execute natural language commands in real-time.
Quadruped robots are capable of traversing a wide range of complex terrains with high flexibility. As highly mobile ground-based intelligent platforms, they can be equipped with modules for navigation control, environmental perception, and intelligent interaction, thereby serving as real-world mobile deployment platforms for various algorithms. In this paper, we introduce Y-BotFrame, an extensible embodied platform that turns a robot into an intelligent ground assistant. Y-BotFrame integrates multimodal perception capabilities, including speech, vision, and LiDAR, and employs a large language model as the cognitive core for environmental understanding, contextual reasoning, and task planning. The system maps user natural-language instructions into executable embodied task units that can be carried out by the robot. Y-BotFrame supports natural interaction through voice commands and visual feedback, removing the need for a remote controller and enabling efficient human-robot collaboration. With a highly extensible framework, Y-BotFrame supports plug-and-play integration of new functional modules as well as modular upgrades and iterative development, offering a reference implementation for the real-world deployment of general-purpose, instruction-driven embodied agents.The supplementary video is available at https://xdei-group.github.io/Y-BotFrame/.