Mar 9, 2026arXiv:2603.07973

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

Ning Liu, Sen Shen, Zheng Li, Sheng Liu, Dongkun Han, Shangke Lyu, Thomas Braunl

AI Summary

VORL-EXPLORE is introduced, a hybrid learning and planning framework for multi-robot exploration that addresses limitations of hierarchical approaches in dynamic environments by coupling task allocation with motion execution via a shared estimate of local navigability called "execution fidelity." This fidelity signal is integrated into a Voronoi objective and drives an adaptive arbitration between global A* and a reactive RL policy, balancing efficiency and safety. Experiments in randomized grids and a Gazebo factory scenario demonstrate improved success rates, path length, overlap, and collision avoidance, along with online self-supervised recalibration for adaptation to non-stationary obstacles.

Key Contribution

By explicitly modeling and sharing "execution fidelity" – an estimate of local navigability – VORL-EXPLORE enables multi-robot exploration that avoids bottlenecks and oscillations common in dense, dynamic environments.

Abstract

Hierarchical multi-robot exploration commonly decouples frontier allocation from local navigation, which can make the system brittle in dense and dynamic environments. Because the allocator lacks direct awareness of execution difficulty, robots may cluster at bottlenecks, trigger oscillatory replanning, and generate redundant coverage. We propose VORL-EXPLORE, a hybrid learning and planning framework that addresses this limitation through execution fidelity, a shared estimate of local navigability that couples task allocation with motion execution. This fidelity signal is incorporated into a fidelity-coupled Voronoi objective with inter-robot repulsion to reduce contention before it emerges. It also drives a risk-aware adaptive arbitration mechanism between global A* guidance and a reactive reinforcement learning policy, balancing long-range efficiency with safe interaction in confined spaces. The framework further supports online self-supervised recalibration of the fidelity model using pseudo-labels derived from recent progress and safety outcomes, enabling adaptation to non-stationary obstacles without manual risk tuning. We evaluate this capability separately in a dedicated severe-traffic ablation. Extensive experiments in randomized grids and a Gazebo factory scenario show high success rates, shorter path length, lower overlap, and robust collision avoidance. The source code will be made publicly available upon acceptance.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

Related Papers