ZJUMay 28, 2026arXiv:2605.29590

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Zhaoyan Pan, Xiangdong Li, Wen Wu, Meng-Yao Ma, Ye Lou, Ji Zhou, Jia Pan, Wei Zhang

AI Summary

This paper introduces CoRe-KD, a knowledge distillation framework for conversational multimodal emotion recognition (MER) that improves robustness to missing modalities. CoRe-KD uses a complete-view teacher model to provide prediction-level, fused-state, and modality-specific state references to guide an incomplete-view student. The method also incorporates a nonverbal conflict exposure strategy to mitigate bias from potentially conflicting nonverbal cues. Experiments on IEMOCAP and MELD datasets demonstrate consistent performance gains under missing modality conditions.

Key Contribution

Achieve state-of-the-art robustness in conversational emotion recognition by distilling knowledge from a complete-view teacher model, even when modalities are missing or conflicting.

Abstract

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be non-unique in dialogue context, and nonverbal cues may conflict with the target utterance. To this end, we propose CoRe-KD (Complete-view Reference-guided Knowledge Distillation), a state-anchored, conflict-regularized complete-view distillation framework for robust conversational MER. A complete-view teacher provides structured references, including prediction-level references, fused states, and modality-specific states. Complete-view State Anchoring (CSA) aligns incomplete-view student predictions and states with these references, while Nonverbal Conflict Exposure (NCE) trains on target-preserving nonverbal conflict views to reduce donor-label bias. Experiments on IEMOCAP and MELD, with CMU-MOSEI as a supplementary utterance-level check, show consistent gains under fixed- and random-missing protocols. Comprehensive ablation studies and further analyses support the role of CSA and the complementary effect of NCE.

Inference & Quantization Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Related Papers