Beijing Forestry UniversityChengdu Minto TechHorizon RoboticsRUCZJUApr 14, 2026arXiv:2604.12856

PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

Kai Ruan, Jiayi Han, kaiyue Zhou, Gaoang Wang

AI Summary

PianoFlow, a flow-matching framework, generates bimanual piano motion by distilling MIDI priors into an audio-driven model for improved semantic understanding. It uses an asymmetric role-gated interaction module to capture dynamic cross-hand coordination and an autoregressive flow continuation scheme for real-time streaming generation. Experiments on PianoMotion10M show PianoFlow achieves superior performance and 9x faster inference compared to prior methods.

Key Contribution

Real-time piano motion generation just got a whole lot smoother: PianoFlow achieves 9x speedup while improving quality by distilling MIDI knowledge into an audio-driven model.

Abstract

Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

Related Papers