Jun 8, 2026arXiv:2606.09268

VGP-Nav: Metric-Aware Visual Geometric Perception for Robot Navigation

Hewei Pan, Weiye Zhu, Zekai Zhang, Zitong Huang, Rongtao Xu, Jinbao Wang, Feng Zheng

AI Summary

The paper introduces VGP-Nav, a unified framework that leverages monocular RGB input to achieve both metric localization and dense obstacle perception in robotic navigation. By anchoring visual geometry to ground-plane scale constraints, VGP-Nav effectively resolves monocular scale ambiguity and produces reliable metric representations for navigation tasks. Experimental results show that VGP-Nav generalizes well across various environments and can be successfully deployed on real mobile robots, underscoring its potential for scalable and cost-effective autonomous navigation solutions.

Key Contribution

Monocular RGB input can now provide reliable metric localization and obstacle perception, solving scale ambiguity in real-time.

Abstract

Reliable robotic navigation necessitates the seamless integration of accurate global localization and dense, metric-consistent obstacle perception. A common strategy to achieve these capabilities involves integrating diverse sensing modalities: cameras offer rich visual features for localization, while active sensors like LiDAR provide direct metric measurements. However, such multi-sensor configurations necessitate complex spatial-temporal calibration and increase deployment overhead. Although vision-only approaches offer a low-cost and scalable alternative, existing monocular visual systems typically struggle to simultaneously achieve efficient, globally consistent localization and dense, metric-consistent geometric perception. To bridge this gap, we propose \textbf{VGP-Nav}, a unified framework for \textit{Metric-Aware Visual Geometric Perception} that relies solely on monocular RGB input to jointly support metric localization and obstacle perception. Our key insight is to anchor localization-grounded visual geometry to physically meaningful scale constraints derived from ground-plane geometry, thereby providing a reliable metric reference for monocular perception. VGP-Nav resolves monocular scale ambiguity online and produces localization-grounded, metric obstacle representations that are directly applicable to downstream planning. Extensive experiments demonstrate strong generalization across diverse environments and successful deployment on real mobile robots, highlighting the practicality of our approach for scalable, low-cost, and safe autonomous navigation.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VGP-Nav: Metric-Aware Visual Geometric Perception for Robot Navigation

Related Papers