Independent ResearcherJun 3, 2025arXiv:2506.03053

MAEBE: Multi-Agent Emergent Behavior Framework

Sinem Erisken, Timothy Gothard, M. Leitgab, Ram Potham

AI Summary

The paper introduces MAEBE, a framework for evaluating emergent risks in multi-agent LLM ensembles, addressing the limitations of single-agent AI safety evaluations. Using MAEBE with the Greatest Good Benchmark and a novel double-inversion question technique, the authors demonstrate the brittleness of LLM moral preferences and the unpredictability of ensemble moral reasoning compared to isolated agents. They find that phenomena like peer pressure can significantly influence ensemble behavior, even under supervision, highlighting new safety and alignment challenges.

Key Contribution

LLM ensembles exhibit surprisingly brittle moral preferences and unpredictable emergent behaviors like peer pressure, even under supervision, demanding a shift from isolated agent evaluations.

Abstract

Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations6

Influential citations1

References13

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

MAEBE: Multi-Agent Emergent Behavior Framework

Related Papers