Mar 16, 2026arXiv:2603.15083

ReactMotion: Generating Reactive Listener Motions from Speaker Utterance

Cheng Luo, Bizhu Wu, Bing Li, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen, Bernard Ghanem

AI Summary

The paper introduces the task of generating listener body motions reactive to a speaker's utterance and presents ReactMotionNet, a dataset pairing speaker utterances with multiple listener motion candidates annotated for appropriateness. To address the non-deterministic nature of listener reactions, they develop preference-oriented evaluation protocols and propose ReactMotion, a generative framework modeling text, audio, emotion, and motion trained with preference-based objectives. Experiments demonstrate ReactMotion's superiority over baselines in generating natural, diverse, and appropriate listener motions.

Key Contribution

ReactMotion generates more realistic and appropriate listener body language than LLM-based pipelines, moving beyond simple input-motion alignment.

Abstract

In this paper, we introduce a new task, Reactive Listener Motion Generation from Speaker Utterance, which aims to generate naturalistic listener body motions that appropriately respond to a speaker's utterance. However, modeling such nonverbal listener behaviors remains underexplored and challenging due to the inherently non-deterministic nature of human reactions. To facilitate this task, we present ReactMotionNet, a large-scale dataset that pairs speaker utterances with multiple candidate listener motions annotated with varying degrees of appropriateness. This dataset design explicitly captures the one-to-many nature of listener behavior and provides supervision beyond a single ground-truth motion. Building on this dataset design, we develop preference-oriented evaluation protocols tailored to evaluate reactive appropriateness, where conventional motion metrics focusing on input-motion alignment ignore. We further propose ReactMotion, a unified generative framework that jointly models text, audio, emotion, and motion, and is trained with preference-based objectives to encourage both appropriate and diverse listener responses. Extensive experiments show that ReactMotion outperforms retrieval baselines and cascaded LLM-based pipelines, generating more natural, diverse, and appropriate listener motions.

Multimodal Models Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ReactMotion: Generating Reactive Listener Motions from Speaker Utterance

Related Papers