Mar 16, 2026arXiv:2603.15370

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Jiangyang Li, Cong Wan, SongLin Dong, Chenhao Ding, Qiang Wang, Zhiheng Ma, Yihong Gong

AI Summary

This paper introduces NavGRPO, a reinforcement learning framework for Vision-and-Language Navigation (VLN) that uses Group Relative Policy Optimization to train goal-directed navigation policies. NavGRPO explores diverse trajectories and optimizes policies by comparing performance within groups of trajectories, allowing the agent to learn effective strategies beyond expert demonstrations. Experiments on R2R and REVERIE benchmarks within the ScaleVLN environment demonstrate that NavGRPO achieves superior robustness, especially under early-stage perturbations, with significant improvements in Success weighted by Path Length (SPL).

Key Contribution

RL agents can learn more robust vision-and-language navigation policies by exploring diverse trajectories and comparing their performance, even without expert demonstrations or value networks.

Abstract

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and poor robustness to execution perturbations. We present NavGRPO, a reinforcement learning framework that learns goal-directed navigation policies through Group Relative Policy Optimization. By exploring diverse trajectories and optimizing via within-group performance comparisons, our method enables agents to distinguish effective strategies beyond expert paths without requiring additional value networks. Built on ScaleVLN, NavGRPO achieves superior robustness on R2R and REVERIE benchmarks with +3.0% and +1.71% SPL improvements in unseen environments. Under extreme early-stage perturbations, we demonstrate +14.89% SPL gain over the baseline, confirming that goal-directed RL training builds substantially more robust navigation policies. Code and models will be released.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Related Papers