USTCZhongguancun AcademyApr 7, 2026arXiv:2604.05873

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

AI Summary

This paper introduces PRISM, a novel framework for multimodal sentiment analysis that addresses limitations of early aggregation by organizing multimodal evidence in a shared prototype space to enable structured cross-modal comparison. PRISM employs dynamic modality reweighting during reasoning, allowing continuous refinement of modality contributions as semantic interactions evolve. Experiments on three benchmark datasets demonstrate that PRISM achieves state-of-the-art performance compared to existing methods.

Key Contribution

Forget monolithic sentiment vectors: PRISM adaptively fuses multimodal cues by comparing them in a shared prototype space, leading to state-of-the-art sentiment analysis.

Abstract

Multimodal sentiment analysis (MSA) aims to predict human sentiment from textual, acoustic, and visual information in videos. Recent studies improve multimodal fusion by modeling modality interaction and assigning different modality weights. However, they usually compress diverse sentiment cues into a single compact representation before sentiment reasoning. This early aggregation makes it difficult to preserve the internal structure of sentiment evidence, where different cues may complement, conflict with, or differ in reliability from each other. In addition, modality importance is often determined only once during fusion, so later reasoning cannot further adjust modality contributions. To address these issues, we propose PRISM, a framework that unifies structured affective extraction and adaptive modality evaluation. PRISM organizes multimodal evidence in a shared prototype space, which supports structured cross-modal comparison and adaptive fusion. It further applies dynamic modality reweighting during reasoning, allowing modality contributions to be continuously refined as semantic interactions become deeper. Experiments on three benchmark datasets show that PRISM outperforms representative baselines.

Multimodal Models Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References56

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

Related Papers