Apr 13, 2026arXiv:2604.11594

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Shuiyuan Wang, Hongfei Yue, Chengyou Wang, Hui Bu

AI Summary

This paper introduces HumDial-EIBench, a new benchmark for evaluating emotional intelligence in Audio Language Models (ALMs) using human-recorded dialogues from the ICASSP 2026 HumDial Challenge. The benchmark reformulates emotional tracking and causal reasoning into multiple-choice questions with adversarial distractors and includes an acoustic-semantic conflict task. Experiments on eight ALMs reveal deficiencies in multi-turn emotional tracking, causal reasoning, and robustness against acoustic-semantic conflicts, highlighting a text-dominance bias.

Key Contribution

ALMs may ace the text, but HumDial-EIBench reveals they're shockingly bad at understanding the emotional nuances of real human conversations.

Abstract

Evaluating the emotional intelligence (EI) of audio language models (ALMs) is critical. However, existing benchmarks mostly rely on synthesized speech, are limited to single-turn interactions, and depend heavily on open-ended scoring. This paper proposes HumDial-EIBench, a comprehensive benchmark for evaluating ALMs'EI. Using real-recorded human dialogues from the ICASSP 2026 HumDial Challenge, it reformulates emotional tracking and causal reasoning into multiple-choice questions with adversarial distractors, mitigating subjective scoring bias for cognitive tasks. It retains the generation of empathetic responses and introduces an acoustic-semantic conflict task to assess robustness against contradictory multimodal signals. Evaluations of eight ALMs reveal that most models struggle with multi-turn emotional tracking and implicit causal reasoning. Furthermore, all models exhibit decoupled textual and acoustic empathy, alongside a severe text-dominance bias during cross-modal conflicts.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Related Papers