Search papers, labs, and topics across Lattice.
This paper addresses the challenge of scaling distributed bandit submodular coordination under communication constraints by optimizing agent communication neighborhoods. They propose a method that limits information relay to one-hop communication and keeps inter-agent messages small while optimizing communication neighborhoods via distributed online bandit optimization. The approach achieves near-optimal action coordination with an anytime suboptimality bound, even for disconnected network topologies, validated through simulations demonstrating faster convergence and superior performance compared to benchmarks.
Achieve near-optimal multi-agent coordination in bandwidth-constrained environments by dynamically optimizing communication neighborhoods, even with limited information sharing.
We study how to scale distributed bandit submodular coordination under realistic communication constraints in bandwidth, data rate, and connectivity. We are motivated by multi-agent tasks of active situational awareness in unknown, partially-observable, and resource-limited environments, where the agents must coordinate through agent-to-agent communication. Our approach enables scalability by (i) limiting information relays to only one-hop communication and (ii) keeping inter-agent messages small, having each agent transmit only its own action information. Despite these information-access restrictions, our approach enables near-optimal action coordination by optimizing the agents' communication neighborhoods over time, through distributed online bandit optimization, subject to the agents' bandwidth constraints. Particularly, our approach enjoys an anytime suboptimality bound that is also strictly positive for arbitrary network topologies, even disconnected. To prove the bound, we define the Value of Coordination (VoC), an information-theoretic metric that quantifies for each agent the benefit of information access to its neighbors. We validate in simulations the scalability and near-optimality of our approach: it is observed to converge faster, outperform benchmarks for bandit submodular coordination, and can even outperform benchmarks that are privileged with a priori knowledge of the environment.