BaiduBrownPKUApr 30, 2026arXiv:2604.27934

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Weihai Lu, Zhejun Zhao, Yanshu Li, Huan He

AI Summary

The paper introduces MM-StanceDet, a multi-agent framework for multimodal stance detection that uses retrieval augmentation for contextual grounding and specialized agents for multimodal analysis. The framework incorporates a reasoning-enhanced debate stage and self-reflection to improve robustness. Experiments across five datasets show MM-StanceDet significantly outperforms existing state-of-the-art baselines, demonstrating the effectiveness of its architecture in handling complex multimodal stance detection challenges.

Key Contribution

Multi-agent debate and self-reflection can dramatically improve multimodal stance detection, even when text and images present conflicting signals.

Abstract

Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting signals, remains challenging. Existing methods often face difficulties with contextual grounding, cross-modal interpretation ambiguity, and single-pass reasoning fragility. To address these, we propose Retrieval-Augmented Multi-modal Multi-agent Stance Detection (MM-StanceDet), a novel multi-agent framework integrating Retrieval Augmentation for contextual grounding, specialized Multimodal Analysis agents for nuanced interpretation, a Reasoning-Enhanced Debate stage for exploring perspectives, and Self-Reflection for robust adjudication. Extensive experiments on five datasets demonstrate MM-StanceDet significantly outperforms state-of-the-art baselines, validating the efficacy of its multi-agent architecture and structured reasoning stages in addressing complex multimodal stance challenges.

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Related Papers