HITKey Lab of MIMSNorthwesternSchool of Computer Science and EngineeringMay 6, 2026arXiv:2605.04439

A cross-modal network for facial expression recognition

Chunwei Tian, Jingyuan Xie, Qi Zhang, Chao Li, Wangmeng Zuo, Shichao Zhang

AI Summary

This paper introduces CMNet, a cross-modal network for facial expression recognition that leverages face symmetry and half-face alignment to extract complementary features. CMNet incorporates a salient facial information refinement module to improve classifier stability by focusing on key expression areas. Experiments demonstrate that CMNet achieves superior performance compared to SCN and LAENet-SA, indicating the effectiveness of integrating biological and structural information with refinement techniques.

Key Contribution

Face symmetry and half-face alignment can be combined to achieve state-of-the-art facial expression recognition.

Abstract

Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Our experimental results demonstrate that CMNet outperforms several novel methods, i.e., SCN and LAENet-SA for facial expression recognition. Codes can be obtained at https://github.com/hellloxiaotian/CMNet.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References74

Year2026

VenueIEEE Transactions on Image Processing

Related Papers

Finding related papers...

Search

A cross-modal network for facial expression recognition

Related Papers