Department of Applied ArtificialDepartment of Data ScienceKETIMar 10, 2026arXiv:2603.09324

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

SangYeop Jeong, Yeongseo Na, Seung Gyu Jeong, Jin-Woo Jeong, Seong-Eun Kim

AI Summary

This paper introduces an emotion-context-aware VR interaction pipeline that integrates real-time speech emotion recognition into LLM-based conversational agents. Prosodic cues are used to infer users' emotional states, which are then injected as explicit dialogue context to influence the agent's response. A user study (N=30) demonstrates that this approach significantly improves dialogue quality, naturalness, engagement, rapport, and human-likeness compared to agents that only process semantics.

Key Contribution

VR agents that "listen" to your tone, not just your words, elicit significantly better user experiences.

Abstract

In VR interactions with embodied conversational agents, users'emotional intent is often conveyed more by how something is said than by what is said. However, most VR agent pipelines rely on speech-to-text processing, discarding prosodic cues and often producing emotionally incongruent responses despite correct semantics. We propose an emotion-context-aware VR interaction pipeline that treats vocal emotion as explicit dialogue context in an LLM-based conversational agent. A real-time speech emotion recognition model infers users'emotional states from prosody, and the resulting emotion labels are injected into the agent's dialogue context to shape response tone and style. Results from a within-subjects VR study (N=30) show significant improvements in dialogue quality, naturalness, engagement, rapport, and human-likeness, with 93.3% of participants preferring the emotion-aware agent.

Natural Language Processing Speech & Audio Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

Related Papers