FudanApr 14, 2026arXiv:2604.12486

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Sunyao Zhou, Yunzi Wu, Tianhang Wang, Xinhai Li, Guang Chen, Lizheng Liu, Lizhen Liu, Chenjia Bai, Xuelong Li

AI Summary

This paper introduces DeCoNav, a decentralized framework for long-horizon collaborative vision-language navigation (VLN) that uses event-triggered dialogue for dynamic task allocation and replanning. DeCoNav addresses limitations in existing VLN benchmarks by enforcing synchronized dual-robot rollout and enabling adaptive coordination based on real-time evidence. Experiments on the DeCoNavBench benchmark, comprising 1,213 tasks across 176 HM3D scenes, show that DeCoNav improves the both-success rate (BSR) by 69.2% compared to static coordination policies.

Key Contribution

Forget static coordination – robots that chat and dynamically re-plan can achieve a whopping 69% improvement in collaborative navigation success.

Abstract

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by introducing the first collaborative long-horizon VLN benchmark with relay-style multi-robot tasks, a collaboration taxonomy, along with graph-grounded generation and evaluation to model handoffs and rendezvous in shared environments. However, existing benchmarks and evaluations often do not enforce strictly synchronized dual-robot rollout on a shared world timeline, and they typically rely on static coordination policies that cannot adapt when new cross-agent evidence emerges. We present Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation (DeCoNav), a decentralized framework that couples event-triggered dialogue with dynamic task allocation and replanning for real-time, adaptive coordination. In DeCoNav, robots exchange compact semantic states via dialogue without a central controller. When informative events such as new evidence, uncertainty, or conflicts arise, dialogue is triggered to dynamically reassign subgoals and replan under synchronized execution. Implemented in DeCoNavBench with 1,213 tasks across 176 HM3D scenes, DeCoNav improves the both-success rate (BSR) by 69.2%, demonstrating the effectiveness of dialogue-driven, dynamically reallocated planning for multi-robot collaboration.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Related Papers