Search papers, labs, and topics across Lattice.
This paper addresses the Submodular Welfare Problem (SWP) under bandit feedback, extending it to a multi-agent combinatorial bandit (MA-CMAB) framework where agents are coupled through shared allocation constraints. They tackle the challenge of maximizing total welfare when agents have monotone submodular utilities and only receive bandit feedback on their allocations. The authors propose an explore-then-commit strategy with randomized assignments, achieving a regret of $\tilde{\mathcal{O}}(T^{2/3})$ compared to a $(1-1/e)$ benchmark, representing the first such guarantee for partition-based SWP under bandit feedback.
First regret bound for partition-based submodular welfare under bandit feedback is achieved by a novel multi-agent combinatorial bandit framework.
We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.