Thien-Minh Nguyen

The University of Queensland, Australia Corresponding author. Abstract Off-world multi-robot exploration is challenged by sparse targets, limited sensing, hazardous terrain, and restricted communication. Many scientifically valuable clues are visually ambiguous and often require close-range observations, making efficient and safe informative path planning essential. Existing methods often rely on predefined areas of interest (AOIs), which may be incomplete or biased, and typically handle terrain risk only through soft penalties, which are insufficient for avoiding non-recoverable regions. To address these issues, we propose a multi-agent informative path planning framework for sparse evidence discovery based on Gaussian belief mapping and dual-domain coverage. The method maintains Gaussian-process-based interest and risk beliefs and combines them with trajectory-intent representations to support coordinated sequential decision-making among multiple agents. It further prioritizes search inside the AOI while preserving limited exploration outside it, thereby improving robustness to AOI bias. In addition, the risk-aware design helps agents balance information gain and operational safety in hazardous environments. Experimental results in simulated lunar environments show that the proposed method consistently outperforms sampling-based and greedy baselines under different budgets and communication ranges. In particular, it achieves lower final uncertainty in risk-aware settings and remains robust under limited communication, demonstrating its effectiveness for cooperative off-world robotic exploration. I INTRODUCTION Off-world surface exploration increasingly demands autonomous multi-robot systems that can search for subtle scientific cues under severe mobility hazards and limited sensing. Many high-value targets (e.g., ancient biological relics, biosignature-like cues, or fine-grained geological evidence) are small, visually ambiguous, and often only confirmable at close range. Consequently, an onboard camera typically has a narrow effective sensing footprint, and mission success depends critically on how robots allocate motion to acquire close-up observations rather than on long-range perception alone. Coverage planning and multi-robot exploration are natural responses to limited sensing footprint: multiple agents can parallelize search and reduce time-to-discovery [21, 5]. However, two practical issues are frequently under-modeled in existing exploration and coverage formulations. First, search is rarely confined to a perfectly defined area of interest (AOI). AOIs are often specified by coarse priors (orbital cues, scientific hypotheses, or operator-defined polygons) and can be incomplete or biased. Optimizing coverage strictly within an AOI can therefore lead to systematic blind spots and reduced robustness when evidence lies outside the presumed region [11, 9]. Second, off-world terrains contain non-recoverable hazards and high-slip regions that can trap a rover (entering a region is feasible, but exiting is not reliably achievable). In such settings, soft risk penalties alone are often insufficient: planning must explicitly enforce recoverability to prevent trajectories that can lead to mission-ending states [12, 1]. Figure 1: Validation in Lunar Environment 1. To efficiently explore the AOI in hazardous lunar terrain, three agents are deployed to operate collaboratively in parallel, and their trajectories are shown by the yellow, blue, and purple curves. From bottom to top, the figure presents the original lunar environment, the mixed Gaussian map, and the PRM planning layer. The red topological nodes and Gaussian-shaped distributions represent forbidden regions by imposing a higher traversal cost during path search. This paper addresses these gaps in a unified multi-agent visual search framework implemented in a Gazebo off-world simulation environment. We consider a team of robots equipped with onboard cameras that must detect sparse evidence online. Detections are intermittent and spatially uncertain, and thus must be integrated into a representation that is both lightweight and planner-compatible. We maintain a sparse Gaussian evidence belief in the world frame, updated incrementally from onboard visual observations [9]. This belief supports principled replanning by quantifying where evidence is likely and where uncertainty remains high. On top of the belief map, we adopt an intent-based multi-agent planning architecture [23, 6]. At each replanning cycle, each agent proposes a small set of candidate intents (e.g., evidence chasing, frontier coverage), and a coordinator selects a non-conflicting subset that maximizes team-level marginal utility under motion feasibility and collision avoidance. Crucially, we extend conventional AOI-centric coverage to a dual-domain objective that explicitly allocates search effort both inside and outside the AOI. The AOI is treated as a high-priority domain, while a controlled background coverage budget mitigates prior bias and enables discovery beyond the assumed region. To ensure operational safety, we incorporate terrain risk and recoverability constraints through a two-stage mechanism: (i) a terrain-derived risk field that discourages hazardous proximity and high-slip regions, and (ii) a hard safety layer that rejects candidate trajectories violating a recoverability criterion defined by dynamic safety buffers and local feasibility checks. This combination prevents ”enter-but-not-exit” behaviors that can otherwise arise when the planner trades safety for short-term coverage gain [8, 13]. We evaluate the proposed system in diverse off-world simulation scenarios with varying hazard layouts, AOI bias levels, and evidence sparsity. Results show that dual-domain exploration improves out-of-AOI discovery and reduces failure modes under AOI misspecification, while recoverability constraints significantly reduce mission-ending traps with minimal loss in search efficiency. The contribution of this paper can be summarized as: • A multi-agent off-world visual search framework that fuses intermittent detections into a sparse GP-based evidence belief for online replanning. • A dual-domain intent-aware cooperative planning strategy that optimizes coverage inside the AOI and in the background region and leverages trajectory intent to reduce redundant exploration and achieve lower final uncertainty under shared budgets. • A risk-aware belief and decision-making mechanism that maintains a GP-based terrain risk belief and integrates it into planning to improve exploration quality and stability in hazardous environments. II Related Works II-A Adaptive Single-Agent IPP Informative Path Planning (IPP) has a long research history. Hollinger and Sukhatme [11] formulated this problem as a trajectory optimization problem that maximizes an information metric under a budget constraint, and pointed out that such problems typically have high computational complexity (e.g., NP-hard / PSPACE-hard). Traditional single-agent IPP methods can be broadly grouped into two categories: sampling-based methods and viewpoint/subgoal selection with continuous trajectory optimization. Representative sampling-based approaches, such as the RIG (Rapidly-exploring Information Gathering) family [12, 8, 13], extend RRT/RRG-style planners to informative path planning by combining sampling search with branch-and-bound, enabling efficient trajectory search under continuous spaces and motion constraints while offering asymptotic optimality and scalability. In contrast, viewpoint/subgoal-based methods [2, 15, 4] typically first identify informative sensing targets and then generate feasible trajectories via local optimization or trajectory generation; for example, Hitz et al. [9] proposed a continuous-space IPP framework that combines Gaussian process modeling with evolutionary optimization and supports online replanning. Recently, single-agent IPP has shifted from offline planning to adaptive IPP (AIPP). In this setting, deep reinforcement learning (DRL) is widely used to learn a mapping from belief states to actions, reducing online replanning cost. Recent studies further incorporate graph representations and attention mechanisms [16, 3] to improve global context modeling and mitigate the short-sightedness of local decision-making. II-B Multi-Agent IPP Multi-agent informative path planning (MAIPP) remains less explored than its single-agent counterpart. Viseras et al. [21] addressed MAIPP by combining greedy planning with collision avoidance; however, prior studies in single-agent IPP have shown that greedy strategies often lead to short-sighted decisions and thus degrade long-term information-gathering efficiency, a limitation that is typically more pronounced in cooperative multi-agent settings [9, 12, 1, 17]. A promising direction is to extend effective single-agent IPP planners to the multi-agent setting and enable distributed coordination through Sequential Greedy Assignment (SGA) [5]: agents plan their paths sequentially according to priority, and each subsequent agent explicitly conditions on the paths already assigned to higher-priority agents, yielding scalable cooperative planning. In recent years, deep reinforcement learning has also been introduced to MAIPP to reduce online computation by learning distributed coordination policies. Furthermore, intent-sharing and attention-based method [23, 6] improve coordination by exchanging distributed predictions of future agent positions, but in adaptive settings with continuously updated beliefs, accumulated prediction errors may still degrade long-horizon planning performance. Figure 2: Overview of our framework. In the constructed map, our method gets two observation interest GP and risk GP, and mixes them with agents’ intent to construct an augmented graph. Our neural network includes two main parts, the encoder and the decoder. After node inputting, the encoder relies on a self-attention block for noticing globe node belief and relationship, as context-aware node features. The decoder cares about the feature of the current node, neighboring node, planning state, and mask. Finally, input the value and action. III Background III-A Gaussian Process (GP) In informative path planning (IPP), both the latent high-value signal (interest) and terrain risk are modeled as continuous functions over a

Papers on Lattice

Total citations

Topics

h-index

Research focus

Robotics & Embodied AI (1)World Models & Planning (1)

Frequent co-authors

Denan Liang (1)Yuanzhe Zhu (1)Yuan Zhu (1)Ruimeng Liu (1)

Papers (1)

Mar 3, 2026

CMU MLMar 3, 2026·also NII, School of Electrical, UQ

Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment

Legged robots can now tiptoe around your expensive gadgets, thanks to a new RL framework that combines semantic understanding with low-level control to avoid stepping on designated objects.

Denan Liang, Yuanzhe Zhu, Yuan Zhu +4

Robotics & Embodied AI World Models & Planning

Search

Thien-Minh Nguyen

Research focus

Frequent co-authors

Papers (1)