Search papers, labs, and topics across Lattice.
This paper introduces ARMOR-MAD, a training-free framework for heterogeneous multi-agent debate (MAD) that optimizes large language model reasoning by treating debate as conditional computation. The framework integrates Pre-debate Agreement Routing (PAR), Early Agreement Stopping Evaluator (EASE), and Semantic Outlier Detection (SOD) to enhance efficiency and accuracy, achieving significant improvements in benchmark tasks such as MATH Level 5 and GSM8K. Results indicate that leveraging model heterogeneity and agreement-based mechanisms can substantially reduce computational waste and error amplification in MAD systems.
ARMOR-MAD achieves up to 96.5% accuracy in multi-agent debate tasks by dynamically routing debate processes, showcasing the power of adaptive computation in large language models.
Multi-agent debate (MAD) can improve large language model reasoning, but fixed debate pipelines often waste computation and can amplify correlated errors among similar agents. We propose ARMOR-MAD, a training-free heterogeneous MAD framework that treats debate as conditional computation. ARMOR-MAD combines three components: Pre-debate Agreement Routing (PAR) decides whether independently generated Round-0 answers require debate; Early Agreement Stopping Evaluator (EASE) stops debate after convergence; and Semantic Outlier Detection (SOD) down-weights abnormal final answers during aggregation. Across MATH Level 5, GSM8K, MMLU, and MMLU-Pro, ARMOR-MAD consistently improves over fixed-round heterogeneous debate with the same model pool, reaching 65.5\%, 96.5\%, 90.0\%, and 81.5\% accuracy, respectively. The results suggest that genuine model heterogeneity and agreement-based control are both important for making MAD more accurate and efficient.