Search papers, labs, and topics across Lattice.
This paper introduces Instrumental, a system that recovers synthesizer parameters from audio using a differentiable subtractive synthesizer and CMA-ES, a derivative-free evolutionary optimizer. The system optimizes a composite perceptual loss function based on mel-scaled STFT, spectral centroid, and MFCC divergence. Experiments on real audio demonstrate that CMA-ES outperforms gradient descent and that spectral analysis initialization accelerates convergence.
Recovering synthesizer parameters directly from audio is now possible with Instrumental, a system that combines a differentiable synthesizer with evolutionary optimization, opening new avenues for timbral analysis and manipulation.
Existing audio-to-MIDI tools extract notes but discard the timbral characteristics that define an instrument's identity. We present Instrumental, a system that recovers continuous synthesizer parameters from audio by coupling a differentiable 28-parameter subtractive synthesizer with CMA-ES, a derivative-free evolutionary optimizer. We optimize a composite perceptual loss combining mel-scaled STFT, spectral centroid, and MFCC divergence, achieving a matching loss of 2.09 on real recorded audio. We systematically evaluate eight hypotheses for improving convergence and find that only parametric EQ boosting yields meaningful improvement. Our results show that CMA-ES outperforms gradient descent on this non-convex landscape, that more parameters do not monotonically improve matching, and that spectral analysis initialization accelerates convergence over random starts.