Corresponding authorZJUMay 6, 2026arXiv:2605.04662

Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling

Xuhai Chen, Zhi Cen, Huaijin Pi, Sida Peng, Xiaowei Zhou

AI Summary

This paper introduces a two-stage framework for duet dance motion synthesis, addressing challenges of limited data and complex human interactions. The first stage uses a motion VQ-VAE with body-part specific encoders and a joint decoder to improve motion representation and consistency. The second stage employs a contact-aware diffusion model that jointly generates motion and a contact matrix, explicitly modeling interactions between dancers.

Key Contribution

Synthesizing realistic duet dance motions gets a boost from explicitly modeling inter-dancer contact, leading to significantly improved interaction fidelity and rhythmic synchronization.

Abstract

Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}_k$ (8.89 vs. 25.30) and $\text{FID}_{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling

Related Papers