Mar 11, 2026arXiv:2603.11126

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

AI Summary

The paper introduces Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a multi-agent framework to improve LLM alignment with human values. VAS-CFA uses multiple moral agents, each fine-tuned to represent a different normative perspective, and fuses their outputs using Combinatorial Fusion Analysis (CFA). Experiments show VAS-CFA outperforms single-agent baselines and other aggregation methods on standard alignment metrics, demonstrating the effectiveness of multi-agent fusion.

Key Contribution

LLMs can be better aligned to human values by fusing the outputs of multiple "moral agents" representing diverse ethical perspectives, outperforming single-agent approaches.

Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

Constitutional AI & AI Ethics RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Related Papers