Feb 23, 2026arXiv:2602.19585

Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis

Chunlei Meng, Chunlei Meng, Jiabin Luo, Jiabin Luo, Zhenglin Yan, Zheng Yan, Zhenyu Yu, R. Fu, Rong Fu, Zhongxue Gan, Chun Ouyang

AI Summary

The paper introduces Tri-Subspace Disentanglement (TSD), a novel framework for multimodal sentiment analysis that factorizes features into common, submodally-shared (pairwise), and private subspaces. This approach addresses the limitations of existing methods that overlook signals shared only by certain modality pairs, thereby improving the expressiveness of multimodal representations. Experiments on CMU-MOSI and CMU-MOSEI datasets demonstrate state-of-the-art performance, with TSD achieving 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI.

Key Contribution

Unlocking superior multimodal sentiment analysis, TSD reveals that disentangling features into common, pairwise, and private subspaces dramatically boosts performance.

Abstract

Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitation, we propose a Tri-Subspace Disentanglement (TSD) framework that explicitly factorizes features into three complementary subspaces: a common subspace capturing global consistency, submodally-shared subspaces modeling pairwise cross-modal synergies, and private subspaces preserving modality-specific cues. To keep these subspaces pure and independent, we introduce a decoupling supervisor together with structured regularization losses. We further design a Subspace-Aware Cross-Attention (SACA) fusion module that adaptively models and integrates information from the three subspaces to obtain richer and more robust representations. Experiments on CMU-MOSI and CMU-MOSEI demonstrate that TSD achieves state-of-the-art performance across all key metrics, reaching 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI, and also transfers well to multimodal intent recognition tasks. Ablation studies confirm that tri-subspace disentanglement and SACA jointly enhance the modeling of multi-granular cross-modal sentiment cues.

Multimodal Models Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis

Related Papers