Tsinghua AIMay 21, 2026arXiv:2605.22816

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Wenxuan Guo, Xiuwei Xu, Yichen Liu, Xiangyu Li, Hang Yin, Huangxing Chen, Wenzhao Zheng, Jianjiang Feng, Jie Zhou, Jiwen Lu

AI Summary

AwareVLN is introduced to improve Vision-Language Navigation by incorporating a self-aware reasoning mechanism that understands the agent's state and task progress. This is achieved through a structural reasoning module for spatial and task-oriented self-awareness and an automatic data engine with progress division for training. Experiments on Habitat demonstrate that AwareVLN significantly outperforms existing VLN methods.

Key Contribution

VLN agents can now navigate more effectively by reasoning about their own state and task progress, closing the gap between end-to-end VLMs and explicit scene mapping.

Abstract

Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for end-to-end action prediction, they often lack an explicit and explainable understanding of the relationships between the agent, the instruction, and the scene. Conversely, explicitly building a scene map for heuristic planning is intuitively appealing but relies on additional 3D sensors and hinders large-scale vision-language pre-training. To bridge this gap, we propose AwareVLN, a novel framework that equips the navigation model with a self-aware reasoning mechanism, enabling it to understand the agent's state and task progress in a fully end-to-end and data-driven manner. Our approach features two key innovations: (1) a structural reasoning module that fosters spatial and task-oriented self-awareness, and (2) an automatic data engine with progress division for effective training. Extensive experiments on various datasets in Habitat simulator show our AwareVLN significantly outperforms previous state-of-the-art vision-language navigation methods. Project page: https://gwxuan.github.io/AwareVLN/.

Multimodal Models Reasoning & Chain-of-Thought Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Related Papers