Search papers, labs, and topics across Lattice.
Qwen-RobotNav is a scalable navigation model designed for agentic navigation systems, allowing for dynamic reconfiguration of observation strategies at inference time to accommodate various tasks like instruction following and autonomous driving. The model employs a parameterized interface that integrates multiple task modes and controllable observation parameters, enabling robust performance without architectural changes during inference. Extensive training on 15.6M samples, combined with vision-language co-training, leads to state-of-the-art results across major navigation benchmarks and showcases strong zero-shot generalization capabilities in real-world robotic applications.
Qwen-RobotNav achieves unprecedented flexibility in navigation tasks by allowing real-time reconfiguration of its observation strategy, setting new benchmarks in the field.
Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.