Jun 11, 2026arXiv:2606.12902

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

Wen Zhang, Xiaocui Yang, Zhuoyue Gao, Shi Feng, Daling Wang, Yifei Zhang

AI Summary

This paper introduces PRISM, a multi-agent framework designed to enhance empathetic spoken dialogue by integrating prosodic expression with semantic response generation. By decoupling speech perception, response generation, and speech synthesis, PRISM allows for better control over emotional alignment and knowledge integration, addressing the limitations of traditional cascade pipelines and end-to-end models. Experimental results show that PRISM significantly improves empathy, prosodic appropriateness, and the quality of text responses, as evidenced by both objective and subjective metrics.

Key Contribution

PRISM achieves a breakthrough in empathetic dialogue systems by seamlessly integrating prosody with language, leading to enhanced emotional expression and response quality.

Abstract

Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration. To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation. Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github.com/Bxzfrm/PRISM.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

Related Papers