Search papers, labs, and topics across Lattice.
This paper introduces a two-stage framework for duet dance motion synthesis, addressing challenges of limited data and complex human interactions. The first stage uses a motion VQ-VAE with body-part specific encoders and a joint decoder to improve motion representation and consistency. The second stage employs a contact-aware diffusion model that jointly generates motion and a contact matrix, explicitly modeling interactions between dancers.
Synthesizing realistic duet dance motions gets a boost from explicitly modeling inter-dancer contact, leading to significantly improved interaction fidelity and rhythmic synchronization.
Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}_k$ (8.89 vs. 25.30) and $\text{FID}_{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.