Beijing AI SafetyCASMar 3, 2026arXiv:2603.02630

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Zhi Hong, Qian Zhang, Jiahang Sun, Zhiwei Shang, Mingze Kong, Xiangyi Wang

AI Summary

This paper introduces MASPOB, a bandit-based framework for optimizing prompts in multi-agent systems (MAS) using LLMs as cognitive backbones. MASPOB addresses sample efficiency, topology-induced coupling, and combinatorial explosion challenges by integrating Upper Confidence Bound (UCB) for exploration/exploitation, Graph Neural Networks (GNNs) to capture structural priors, and coordinate ascent to decompose the optimization problem. Experiments on diverse benchmarks demonstrate that MASPOB achieves state-of-the-art performance compared to existing baselines.

Key Contribution

Optimize prompts for multi-agent systems using LLMs with a new bandit-based approach that uses GNNs to capture topology-aware prompt semantics, achieving state-of-the-art performance.

Abstract

Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approach to improve its performance. However, real-world prompt optimization for MAS is impeded by three key challenges: (1) the need of sample efficiency due to prohibitive evaluation costs, (2) topology-induced coupling among prompts, and (3) the combinatorial explosion of the search space. To address these challenges, we introduce MASPOB (Multi-Agent System Prompt Optimization via Bandits), a novel sample-efficient framework based on bandits. By leveraging Upper Confidence Bound (UCB) to quantify uncertainty, the bandit framework balances exploration and exploitation, maximizing gains within a strictly limited budget. To handle topology-induced coupling, MASPOB integrates Graph Neural Networks (GNNs) to capture structural priors, learning topology-aware representations of prompt semantics. Furthermore, it employs coordinate ascent to decompose the optimization into univariate sub-problems, reducing search complexity from exponential to linear. Extensive experiments across diverse benchmarks demonstrate that MASPOB achieves state-of-the-art performance, consistently outperforming existing baselines.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Related Papers