Search papers, labs, and topics across Lattice.
This paper introduces OneVLA, a unified framework that integrates navigation and manipulation tasks for embodied intelligence, addressing the limitations of existing Vision-Language-Action (VLA) models that are typically specialized for either task. By employing a novel unified action head and a multi-stage progressive training strategy that leverages curated data and Chain-of-Thought fine-tuning, OneVLA facilitates significant positive transfer and mutual reinforcement between navigation and manipulation. Experimental results demonstrate that OneVLA achieves state-of-the-art performance in both simulated and real-world environments, outperforming specialized and existing cross-task models, thereby advancing the development of general-purpose robotic agents.
OneVLA unifies navigation and manipulation tasks into a single framework, enabling robots to seamlessly interpret commands and interact with their environments like never before.
Navigation and manipulation are fundamental capabilities of embodied intelligence, enabling robots to interpret natural language commands and interact physically with their surroundings. However, current Vision-Language-Action (VLA) models remain constrained by task-specific architectures, specializing in either navigation or manipulation, which hinders the development of general-purpose robotic agents. To bridge this gap, we introduce OneVLA, a unified architecture that integrates these distinct tasks into a single, cohesive framework. Specifically, we design a unified action head capable of generating both navigation and manipulation actions without requiring task-specific variants. Furthermore, we propose a multi stage progressive training strategy-incorporating curated data construction and Chain-of-Thought (CoT) fine-tuning that facilitates strong positive transfer and mutual reinforcement between the two domains. Extensive experiments in both simulated and real-world environments demonstrate that OneVLA achieves state-of-the-art performance, significantly outperforming both specialized single-task and existing cross-task models. By unifying these core capabilities, OneVLA paves the way for truly general-purpose robotic systems. The model and source code will be publicly released.