Search papers, labs, and topics across Lattice.
SyncTrack, a novel multi-track waveform music generation model, is introduced to address the lack of rhythmic stability and synchronization in existing models. The architecture incorporates track-shared modules with cross-track attention for rhythm synchronization and track-specific modules with learnable instrument priors for timbre representation. Experiments using novel rhythmic consistency metrics (IRS, CBS, CBD) demonstrate that SyncTrack significantly improves multi-track music quality by enhancing rhythmic consistency.
Multi-track music generation gets a rhythmic upgrade: SyncTrack uses cross-track attention and instrument priors to create harmonically stable and synchronized compositions.
Multi-track music generation has garnered significant research interest due to its precise mixing and remixing capabilities. However, existing models often overlook essential attributes such as rhythmic stability and synchronization, leading to a focus on differences between tracks rather than their inherent properties. In this paper, we introduce SyncTrack, a synchronous multi-track waveform music generation model designed to capture the unique characteristics of multi-track music. SyncTrack features a novel architecture that includes track-shared modules to establish a common rhythm across all tracks and track-specific modules to accommodate diverse timbres and pitch ranges. Each track-shared module employs two cross-track attention mechanisms to synchronize rhythmic information, while each track-specific module utilizes learnable instrument priors to better represent timbre and other unique features. Additionally, we enhance the evaluation of multi-track music quality by introducing rhythmic consistency through three novel metrics: Inner-track Rhythmic Stability (IRS), Cross-track Beat Synchronization (CBS), and Cross-track Beat Dispersion (CBD). Experiments demonstrate that SyncTrack significantly improves the multi-track music quality by enhancing rhythmic consistency.