Mar 18, 2026arXiv:2603.17437

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Kehan Chen, Yan Huang, Dong An, Jiawei He, Yifei Su, Jing Liu, Nianfeng Liu, Liang Wang

AI Summary

The paper introduces FloorPlan-VLN, a new vision-language navigation paradigm that incorporates semantic floor plans as global spatial priors to enable navigation with concise instructions. They construct a dataset of 10k episodes across 72 scenes, pairing annotated floor plans with Matterport3D trajectories and concise instructions. They also propose FP-Nav, a method that uses dual-view video sequences and auxiliary reasoning tasks to align observations, floor plans, and instructions, achieving a 60% relative improvement in navigation success rate compared to VLN baselines.

Key Contribution

Forget verbose instructions: this new VLN paradigm uses floor plans to guide navigation with concise commands, boosting success rates by 60%.

Abstract

Existing Vision-Language Navigation (VLN) task requires agents to follow verbose instructions, ignoring some potentially useful global spatial priors, limiting their capability to reason about spatial structures. Although human-readable spatial schematics (e.g., floor plans) are ubiquitous in real-world buildings, current agents lack the cognitive ability to comprehend and utilize them. To bridge this gap, we introduce \textbf{FloorPlan-VLN}, a new paradigm that leverages structured semantic floor plans as global spatial priors to enable navigation with only concise instructions. We first construct the FloorPlan-VLN dataset, which comprises over 10k episodes across 72 scenes. It pairs more than 100 semantically annotated floor plans with Matterport3D-based navigation trajectories and concise instructions that omit step-by-step guidance. Then, we propose a simple yet effective method \textbf{FP-Nav} that uses a dual-view, spatio-temporally aligned video sequence, and auxiliary reasoning tasks to align observations, floor plans, and instructions. When evaluated under this new benchmark, our method significantly outperforms adapted state-of-the-art VLN baselines, achieving more than a 60\% relative improvement in navigation success rate. Furthermore, comprehensive noise modeling and real-world deployments demonstrate the feasibility and robustness of FP-Nav to actuation drift and floor plan distortions. These results validate the effectiveness of floor plan guided navigation and highlight FloorPlan-VLN as a promising step toward more spatially intelligent navigation.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References81

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Related Papers