Search papers, labs, and topics across Lattice.
PianoFlow, a flow-matching framework, generates bimanual piano motion by distilling MIDI priors into an audio-driven model for improved semantic understanding. It uses an asymmetric role-gated interaction module to capture dynamic cross-hand coordination and an autoregressive flow continuation scheme for real-time streaming generation. Experiments on PianoMotion10M show PianoFlow achieves superior performance and 9x faster inference compared to prior methods.
Real-time piano motion generation just got a whole lot smoother: PianoFlow achieves 9x speedup while improving quality by distilling MIDI knowledge into an audio-driven model.
Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.