Search papers, labs, and topics across Lattice.
The paper introduces Coordinated Boltzmann MCTS (CB-MCTS), a novel decentralized multi-agent planning algorithm designed to improve exploration in sparse or skewed reward environments. CB-MCTS replaces the deterministic UCT action selection in Dec-MCTS with a stochastic Boltzmann policy and a decaying entropy bonus to encourage sustained and focused exploration. The authors demonstrate through simulations that CB-MCTS outperforms Dec-MCTS in deceptive scenarios while maintaining competitive performance on standard benchmarks, showcasing its robustness.
Boltzmann exploration, previously limited to single-agent systems, now powers a robust decentralized multi-agent planner that conquers deceptive reward landscapes.
Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.