Great Bay UniversityGuangming LaboratoryApr 14, 2026arXiv:2604.12735

AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition

Zeheng Wang, Zitong Yu, Yijie Zhu, Haochen Liang, Taorui Wang, Wei Xia, Jiayu Zhang, Zhishu Liu, Hui Ma, Fei Ma, Qi Tian

AI Summary

The paper introduces AffectAgent, a multi-agent retrieval-augmented generation framework for improved multimodal emotion recognition. It uses three specialized agents (query planner, evidence filter, emotion generator) optimized via MAPPO with a shared affective reward to perform analytical reasoning, retrieve cross-modal samples, and generate predictions. The framework also incorporates Modality-Balancing Mixture of Experts (MB-MoE) and Retrieval-Augmented Adaptive Fusion (RAAF) to handle cross-modal heterogeneity and missing modalities, respectively, leading to state-of-the-art results on MER-UniBench.

Key Contribution

Multi-agent collaboration and retrieval augmentation can overcome the limitations of static parametric memory in LLMs, enabling more nuanced and accurate multimodal emotion recognition.

Abstract

LLM-based multimodal emotion recognition relies on static parametric memory and often hallucinates when interpreting nuanced affective states. In this paper, given that single-round retrieval-augmented generation is highly susceptible to modal ambiguity and therefore struggles to capture complex affective dependencies across modalities, we introduce AffectAgent, an affect-oriented multi-agent retrieval-augmented generation framework that leverages collaborative decision-making among agents for fine-grained affective understanding. Specifically, AffectAgent comprises three jointly optimized specialized agents, namely a query planner, an evidence filter, and an emotion generator, which collaboratively perform analytical reasoning to retrieve cross-modal samples, assess evidence, and generate predictions. These agents are optimized end-to-end using Multi-Agent Proximal Policy Optimization (MAPPO) with a shared affective reward to ensure consistent emotion understanding. Furthermore, we introduce Modality-Balancing Mixture of Experts (MB-MoE) and Retrieval-Augmented Adaptive Fusion (RAAF), where MB-MoE dynamically regulates the contributions of different modalities to mitigate representation mismatch caused by cross-modal heterogeneity, while RAAF enhances semantic completion under missing-modality conditions by incorporating retrieved audiovisual embeddings. Extensive experiments on MER-UniBench demonstrate that AffectAgent achieves superior performance across complex scenarios. Our code will be released at: https://github.com/Wz1h1NG/AffectAgent.

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition

Related Papers