Search papers, labs, and topics across Lattice.
This paper introduces a Motion-Emotion Feature Decoupling Network (MEDN) to address the challenge of visually similar micro-expressions with opposite emotional categories. MEDN uses a dual-branch architecture to separately extract motion features (via AU-detection and orthogonal loss) and emotion features (via a Sparse Emotion Vision Transformer). Experiments on benchmark datasets demonstrate that MEDN effectively decouples motion and emotion features, leading to improved micro-expression recognition performance.
Micro-expressions that look identical can convey opposite emotions, and MEDN teases apart motion and emotion cues to spot the difference.
Unlike macro-expression, micro-expression does not follow a strictly consistent mapping rule between emotions and Action Units (AUs). As a result, some micro-expressions share identical AUs yet represent completely opposite emotional categories, making them highly visually similar. Existing microexpression recognition (MER) methods mostly rely on explicit facial motion cues (e.g., optical flow, frame differences, AU features) while ignoring implicit emotion information. To tackle this issue, this paper presents a Motion Emotion Feature Decoupling Network (MEDN) for MER. We design a dual-branch framework to separately extract motion and emotion features. In the motion branch, an AU-detection task restricts features to the explicit motion domain, and orthogonal loss is adopted to reduce motion emotion feature coupling. For implicit emotion modeling, we propose a Sparse Emotion Vision Transformer (SEVit) that sparsifies spatial tokens to highlight local temporal variations with multi-scale sparsity rates. A Collaborative Fusion Module (CoFM) is further developed to fuse disentangled motion and emotion features adaptively. Extensive experiments on three benchmark datasets validate that MEDN effectively decouples motion and emotion features and achieves superior recognition performance, offering a new perspective for enhancing recognition accuracy and generalization.