Search papers, labs, and topics across Lattice.
The paper introduces MM-StanceDet, a multi-agent framework for multimodal stance detection that uses retrieval augmentation for contextual grounding and specialized agents for multimodal analysis. The framework incorporates a reasoning-enhanced debate stage and self-reflection to improve robustness. Experiments across five datasets show MM-StanceDet significantly outperforms existing state-of-the-art baselines, demonstrating the effectiveness of its architecture in handling complex multimodal stance detection challenges.
Multi-agent debate and self-reflection can dramatically improve multimodal stance detection, even when text and images present conflicting signals.
Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting signals, remains challenging. Existing methods often face difficulties with contextual grounding, cross-modal interpretation ambiguity, and single-pass reasoning fragility. To address these, we propose Retrieval-Augmented Multi-modal Multi-agent Stance Detection (MM-StanceDet), a novel multi-agent framework integrating Retrieval Augmentation for contextual grounding, specialized Multimodal Analysis agents for nuanced interpretation, a Reasoning-Enhanced Debate stage for exploring perspectives, and Self-Reflection for robust adjudication. Extensive experiments on five datasets demonstrate MM-StanceDet significantly outperforms state-of-the-art baselines, validating the efficacy of its multi-agent architecture and structured reasoning stages in addressing complex multimodal stance challenges.