Search papers, labs, and topics across Lattice.
This paper introduces a diffusion model-based approach for active search, enabling agents to efficiently balance exploration and exploitation without relying on computationally expensive tree search. The authors address the optimism bias inherent in diffusion-based reinforcement learning when applied to active search. They demonstrate that their algorithm outperforms standard offline reinforcement learning baselines in terms of full recovery rate and achieves greater computational efficiency compared to tree search methods for cost-aware active decision making, particularly in multi-agent scenarios.
Diffusion models can efficiently sample lookahead action sequences for active search, outperforming traditional tree search while mitigating optimism bias.
Active search for recovering objects of interest through online, adaptive decision making with autonomous agents requires trading off exploration of unknown environments with exploitation of prior observations in the search space. Prior work has proposed information gain and Thompson sampling based myopic, greedy approaches for agents to actively decide query or search locations when the number of targets is unknown. Decision making algorithms in such partially observable environments have also shown that agents capable of lookahead over a finite horizon outperform myopic policies for active search. Unfortunately, lookahead algorithms typically rely on building a computationally expensive search tree that is simulated and updated based on the agent's observations and a model of the environment dynamics. Instead, in this work, we leverage the sequence modeling abilities of diffusion models to sample lookahead action sequences that balance the exploration-exploitation trade-off for active search without building an exhaustive search tree. We identify the optimism bias in prior diffusion based reinforcement learning approaches when applied to the active search setting and propose mitigating solutions for efficient cost-aware decision making with both single and multi-agent teams. Our proposed algorithm outperforms standard baselines in offline reinforcement learning in terms of full recovery rate and is computationally more efficient than tree search in cost-aware active decision making.