Ulsan National Institute of Science and TechnologyMar 9, 2026arXiv:2603.08023

Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model

Sangjun Park, Sangjune Park, Inhyeok Choi, Donghyeon Soon, Youn-Sug Jeon, Youngwoo Jeon, Kyungdon Joo

AI Summary

This paper introduces MambaDance, a novel dance generation framework that replaces Transformers with Mamba within a two-stage diffusion architecture. It incorporates a Gaussian-based beat representation to explicitly guide the decoding of dance sequences, addressing the limitations of prior methods in capturing the sequential, rhythmical, and music-synchronized aspects of dance. Experiments on AIST++ and FineDance datasets demonstrate that MambaDance generates more plausible and characteristic dance movements, especially for longer sequences, compared to Transformer-based approaches.

Key Contribution

Mamba's superior sequence modeling lets you generate longer, more realistic dance sequences than clunky Transformers ever could.

Abstract

Dance is a form of human motion characterized by emotional expression and communication, playing a role in various fields such as music, virtual reality, and content creation. Existing methods for dance generation often fail to adequately capture the inherently sequential, rhythmical, and music-synchronized characteristics of dance. In this paper, we propose \emph{MambaDance}, a new dance generation approach that leverages a Mamba-based diffusion model. Mamba, well-suited to handling long and autoregressive sequences, is integrated into our two-stage diffusion architecture, substituting off-the-shelf Transformer. Additionally, considering the critical role of musical beats in dance choreography, we propose a Gaussian-based beat representation to explicitly guide the decoding of dance sequences. Experiments on AIST++ and FineDance datasets for each sequence length show that our proposed method effectively generates plausible dance movements while reflecting essential characteristics, consistently from short to long dances, compared to the previous methods. Additional qualitative results and demo videos are available at \small{https://vision3d-lab.github.io/mambadance}.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Speech & Audio

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model

Related Papers