Feb 22, 2026arXiv:2602.19308

WildOS: Open-Vocabulary Object Search in the Wild

Hardik Shah, Erica Tevere, Deegan Atha, Marcel Kaufmann, Shehryar Khattak, Manthan Patel, Marco Hutter, Jonas Frey, Patrick Spieler

AI Summary

The paper introduces WildOS, a system for long-range, open-vocabulary object search that integrates geometric exploration with semantic visual reasoning using foundation models. WildOS constructs a sparse navigation graph and employs ExploRFM, a vision module based on foundation models, to score frontier nodes based on traversability, visual frontiers, and object similarity. A particle filter is used for coarse localization of open-vocabulary targets, enabling planning towards distant goals, and closed-loop field experiments demonstrate WildOS's superior performance compared to geometric and vision-based baselines.

Key Contribution

Robots can now navigate complex outdoor environments and find objects using natural language queries, even without prior maps or precise depth sensing.

Abstract

Autonomous navigation in complex, unstructured outdoor environments requires robots to operate over long ranges without prior maps and limited depth sensing. In such settings, relying solely on geometric frontiers for exploration is often insufficient. In such settings, the ability to reason semantically about where to go and what is safe to traverse is crucial for robust, efficient exploration. This work presents WildOS, a unified system for long-range, open-vocabulary object search that combines safe geometric exploration with semantic visual reasoning. WildOS builds a sparse navigation graph to maintain spatial memory, while utilizing a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph. ExploRFM simultaneously predicts traversability, visual frontiers, and object similarity in image space, enabling real-time, onboard semantic navigation tasks. The resulting vision-scored graph enables the robot to explore semantically meaningful directions while ensuring geometric safety. Furthermore, we introduce a particle-filter-based method for coarse localization of the open-vocabulary target query, that estimates candidate goal positions beyond the robot's immediate depth horizon, enabling effective planning toward distant goals. Extensive closed-loop field experiments across diverse off-road and urban terrains demonstrate that WildOS enables robust navigation, significantly outperforming purely geometric and purely vision-based baselines in both efficiency and autonomy. Our results highlight the potential of vision foundation models to drive open-world robotic behaviors that are both semantically informed and geometrically grounded. Project Page: https://leggedrobotics.github.io/wildos/

Computer Vision Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...