GaozzzzgithubHuaweiInstitute of AutomationNoah's Ark LabSchool of Software TechnologyUncertainty-Aware-VLNZJUMay 26, 2026arXiv:2605.26503

Uncertainty-Aware Gaussian Map for Vision-Language Navigation

Jianzhe Gao, Yuxuan Xu, Tongtong Cao, Yingxue Zhang, Zhanguang Zhang, Sida Peng, Yi Yang, Wenguan Wang

AI Summary

This paper introduces an Uncertainty-Aware Gaussian Map (UAGM) for Vision-Language Navigation (VLN) that explicitly models geometric, semantic, and appearance uncertainty to improve agent decision-making. The agent constructs a Semantic Gaussian Map (SGM) from panoramic observations and then estimates uncertainty through variational perturbations of Gaussian position, scale, and semantic attributes, as well as Fisher Information. Integrating these uncertainties into a unified 3D Value Map allows the agent to ground them as affordances and constraints, leading to improved performance on VLN benchmarks.

Key Contribution

Overcoming perceptual uncertainty in vision-language navigation is now possible by explicitly modeling geometric, semantic, and appearance uncertainty with a novel Uncertainty-Aware Gaussian Map.

Abstract

Vision-Language Navigation (VLN) requires an agent to navigate 3D environments following natural language instructions. During navigation, existing agents commonly encounter perceptual uncertainty, such as insufficient evidence for reliable grounding or ambiguity in interpreting spatial cues, yet they typically ignore such information when predicting actions. In this work, we explicitly model three forms of perceptual uncertainty (i.e., geometric, semantic, and appearance uncertainty) and integrate them into the agent's observation space to enable informed decision-making. Concretely, our agent first constructs a Semantic Gaussian Map (SGM), composed of differentiable 3D Gaussian primitives initialized from panoramic observations, that encodes both the geometric structure and semantic content of the environment. On top of SGM, geometric uncertainty is estimated through variational perturbations of Gaussian position and scale to assess structural reliability; semantic uncertainty is captured by perturbing Gaussian semantic attributes to reveal ambiguous interpretations; and appearance uncertainty is characterized by Fisher Information, which measures the sensitivity of rendered observations to Gaussian-level variations. These uncertainties are incorporated into SGM, extending it into a unified 3D Value Map, which grounds them as affordances and constraints that support reliable navigation. Comprehensive evaluations across multiple VLN benchmarks show the effectiveness of our agent.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Uncertainty-Aware Gaussian Map for Vision-Language Navigation

Related Papers