Search papers, labs, and topics across Lattice.
×10−41\times 10^{-4}, decayed by a factor of 0.960.96 every 3232 steps, and a batch size of 256256. PPO performs 88 update epochs per training step. Training is conducted on a workstation with a 16-core AMD EPYC 7542 CPU and four NVIDIA GeForce RTX 3090 GPUs, and converges in approximately 2424 hours (∼17,000\sim 17{,}000 episodes). V Comparison and Analysis In all experiments, the agents share the same initial position v1v_{1} and the same budget BB. Based on this setting, we aggregate each agent’s sampled predicted trajectories {gi(t)}\{g^{i}(t)\} into trajectory intent (TI), where the intent distribution is obtained by fitting a Gaussian distribution (GD) over all nodes along the sampled trajectories. A visual example of the resulting intent representation is shown in Fig. 3. TABLE I: Comparison between different domain-aware variants in terms of Tr(Pf)\mathrm{Tr}(P_{f}) for 3 agents (10 trials on 30 instances). Method Domain Budget 2 Budget 3 Budget 4 Budget 5 Interest Risk mean std mean std mean std mean std SGA + RRT (0.9,1.0) ✓\checkmark ×\times 182.53 103.83 139.49 96.95 97.45 86.52 44.64 35.89 SGA + RRT (0.3,0.4) ✓\checkmark ×\times 200.46 203.52 270.32 302.05 154.26 252.81 214.67 219.22 CAtNIPP + Greedy ✓\checkmark ×\times 45.59 35.38 26.27 31.21 21.48 31.56 23.42 31.13 CAtNIPP + TI (8,5) ✓\checkmark ×\times 48.57 33.87 25.51 21.34 26.18 18.32 14.24 7.08 Ours + TI (8,5) ✓\checkmark ✓\checkmark 37.30 16.26 21.76 12.34 19.32 10.24 10.99 6.16 TABLE II: Ablation study in terms of final average Tr(Pf)\mathrm{Tr}(P_{f}) for 10 agents (10 trials on 30 instances, single-domain). Method Budget 2 Budget 3 Budget 4 mean std mean std mean std SGA + RRT (0.9,1.0) 32.63 1.82 21.23 1.78 15.26 1.14 CAtNIPP + Greedy 29.56 3.80 12.02 1.22 6.35 0.83 CAtNIPP + TI (8,5) 18.03 2.26 5.32 0.46 3.30 0.34 Ours + TI (8,5) 19.69 2.35 5.21 0.23 3.07 0.34 V-A Experimental Setup We compare our method with two sequential greedy assignment (SGA) baselines, namely SGA-RRT [11] and Greedy-CAtNIPP [3], as well as the intent-based version of CAtNIPP [23]. (i) SGA-RRT [11] uses a conventional RRT-based planning strategy [22, 14], where agents plan sequentially by conditioning on both the current belief and the paths assigned to higher-priority agents. Each agent samples a set of candidate paths and selects the one that minimizes Tr(Pf)\mathrm{Tr}(P_{f}). In execution, only a short segment of the selected path is followed (set to 0.2 in our experiments), after which the belief is updated and replanning is performed. (ii) Greedy-CAtNIPP [3] selects the nearest viewpoint at each decision step. (iii) Intent-CAtNIPP [23] extends CAtNIPP [3] to the multi-agent setting by incorporating intent information. We report the final uncertainty Tr(Pf)\mathrm{Tr}(P_{f}) in the experiment. Tr(Pf)\mathrm{Tr}(P_{f}) is defined as the trace of the GP posterior covariance on the query grid and equals the sum of predictive variances over all grid locations. Lower values indicate more effective uncertainty reduction. The reported mean and std are computed over repeated runs. The mean summarizes the expected terminal uncertainty and the std reflects the dispersion across runs and thus the stability of performance under the same evaluation protocol. In the Table I, we set the number of agents to m=3m=3, and all methods are evaluated under the same budget settings. To mimic hazardous lunar terrain, we randomly generate 4–6 risk zones in each instance. Since the baselines do not model risk explicitly, risky waypoints are removed from the sampled roadmap nodes for those methods. In contrast, our method explicitly models both interest and risk, and we additionally report a risk-free variant in Table II. For the RRT baselines, RRT(a,b)(a,b) denotes that the selected trajectory length lies in the range [a,b][a,b]. For the TI-based variants, (a,j)(a,j) denotes that each agent samples aa candidate trajectories and each trajectory contains jj nodes. To further evaluate robustness under communication constraints, we also test communication ranges of 0.3 and 0.6, and compare them with the global communication setting in Table III. V-B Result and Analysis Our method performs best in the risk-aware three-agent setting and remains competitive across the other evaluated settings. Compared with the sampling-based SGA-RRT [11] and the local greedy baseline Greedy-CAtNIPP [3] , our method achieves substantially lower final uncertainty, showing the benefit of jointly modeling trajectory intent and environmental risk. In the Table I, all methods generally benefit from larger budgets, while our method consistently achieves the best performance. For example, our method reduces Tr(Pf)\mathrm{Tr}(P_{f}) from 37.30 to 10.99 as the budget increases from 2 to 5, whereas Greedy-CAtNIPP [3] only decreases from 45.59 to 23.42 over the same range. Compared with Intent-CAtNIPP [23], the gain of our method is moderate but consistent; for instance, the result improves from 25.51 to 21.76 at budget 3 and from 14.24 to 10.99 at budget 5. The gap becomes much larger when compared with SGA-RRT [11], whose results remain above 100 in several settings. These results suggest that trajectory-intent modeling already improves coordination, while explicitly incorporating risk information further enhances planning quality in hazardous environments. TABLE III: Performance under limited communication ranges, in terms of Tr(Pf)\mathrm{Tr}(P_{f}), for 3 agents (10 trials on 30 instances). Method Communication Range
NVIDIA Research1
0
3
Forget predefined areas of interest: this multi-agent exploration framework uses Gaussian belief mapping to adaptively balance scientific discovery and safety in hazardous off-world environments.