Fujian Normal UniversityRMITMar 2, 2026arXiv:2603.02123

Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

Jiahao Huang, Fengyan Lin, Xuechao Yang, Chen Feng, Kexin Zhu, Xu Yang, Zhide Chen

AI Summary

The paper introduces Nano-EmoX, a 2.2B parameter multimodal language model (MLM) designed to unify affective tasks across perception, understanding, and interaction levels, addressing the fragmented capabilities of existing affective MLMs. They propose a cognitively inspired three-level hierarchy to organize affective tasks and a curriculum-based training framework called P2E (Perception-to-Empathy) to improve emotional intelligence. Nano-EmoX, integrating omni-modal encoders and heterogeneous adapters, achieves state-of-the-art or competitive performance on six core affective tasks, demonstrating improved efficiency and generalization.

Key Contribution

A 2.2B parameter model, Nano-EmoX, rivals larger models in multimodal emotional intelligence by unifying perception, understanding, and interaction through a novel training curriculum.

Abstract

The development of affective multimodal language models (MLMs) has long been constrained by a gap between low-level perception and high-level interaction, leading to fragmented affective capabilities and limited generalization. To bridge this gap, we propose a cognitively inspired three-level hierarchy that organizes affective tasks according to their cognitive depth-perception, understanding, and interaction-and provides a unified conceptual foundation for advancing affective modeling. Guided by this hierarchy, we introduce Nano-EmoX, a small-scale multitask MLM, and P2E (Perception-to-Empathy), a curriculum-based training framework. Nano-EmoX integrates a suite of omni-modal encoders, including an enhanced facial encoder and a fusion encoder, to capture key multimodal affective cues and improve cross-task transferability. The outputs are projected into a unified language space via heterogeneous adapters, empowering a lightweight language model to tackle diverse affective tasks. Concurrently, P2E progressively cultivates emotional intelligence by aligning rapid perception with chain-of-thought-driven empathy. To the best of our knowledge, Nano-EmoX is the first compact MLM (2.2B) to unify six core affective tasks across all three hierarchy levels, achieving state-of-the-art or highly competitive performance across multiple benchmarks, demonstrating excellent efficiency and generalization.

Multimodal Models Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

Related Papers