Search papers, labs, and topics across Lattice.
This paper introduces AdaTT, a target-adaptive system designed to enhance instrument timbre transfer by addressing the timbral ambiguity caused by conflicting expressive details between source and target instruments. By selectively scaling the influence of pitch and loudness controls based on text prompts, AdaTT achieves high timbral fidelity across various timbre transfer scenarios. Experimental results demonstrate that AdaTT significantly improves both timbral fidelity and naturalness while preserving the underlying score-level content, outperforming existing methods.
AdaTT achieves unprecedented timbral fidelity in instrument transfer by intelligently adapting expressive controls to match target instruments, outperforming traditional methods.
This paper addresses timbral ambiguity in instrument timbre transfer under fine-grained structural conditions. We argue this issue stems from instrument-specific expressive details in these conditions, which conflict with the target timbral properties. For example, imposing a violin's pitch-dominant vibrato contours onto a flute, which naturally exhibits loudness-dominant vibrato, impairs timbral fidelity. We propose AdaTT, a target-adaptive system that ensures high timbral fidelity across diverse timbre transfer scenarios within the ControlNet scheme. It selectively scales the frame-wise influence of pitch and loudness controls via text prompts to match the target instrument's identity. We also present a semi-automatic data construction pipeline to teach the model which expressive details to transform or preserve. Results show AdaTT achieves superior timbral fidelity and naturalness while retaining score-level content. Audio samples are available at https://dabinkim0.github.io/adatt/.