Mar 12, 2026arXiv:2603.11468

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation

Yubeen Lee, Sangeun Lee, Junyeop Cha, Eunil Park

AI Summary

The paper introduces SAGE, a stage-adaptive reliability modeling framework for continuous valence-arousal estimation that addresses the challenge of inconsistent modality reliability in real-world audio-visual signals. SAGE estimates and calibrates modality-wise confidence during multimodal integration, dynamically rebalancing audio and visual representations based on their stage-dependent informativeness. Experiments on the Aff-Wild2 benchmark show that SAGE improves concordance correlation coefficient scores compared to existing multimodal fusion approaches, demonstrating the effectiveness of reliability-driven modeling.

Key Contribution

By explicitly modeling and adapting to the reliability of audio and visual signals at different interaction stages, SAGE achieves more stable emotion estimation under cross-modal noise and occlusion.

Abstract

Continuous valence-arousal estimation in real-world environments is challenging due to inconsistent modality reliability and interaction-dependent variability in audio-visual signals. Existing approaches primarily focus on modeling temporal dynamics, often overlooking the fact that modality reliability can vary substantially across interaction stages. To address this issue, we propose SAGE, a Stage-Adaptive reliability modeling framework that explicitly estimates and calibrates modality-wise confidence during multimodal integration. SAGE introduces a reliability-aware fusion mechanism that dynamically rebalances audio and visual representations according to their stage-dependent informativeness, preventing unreliable signals from dominating the prediction process. By separating reliability estimation from feature representation, the proposed framework enables more stable emotion estimation under cross-modal noise, occlusion, and varying interaction conditions. Extensive experiments on the Aff-Wild2 benchmark demonstrate that SAGE consistently improves concordance correlation coefficient scores compared with existing multimodal fusion approaches, highlighting the effectiveness of reliability-driven modeling for continuous affect prediction.

Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation

Related Papers