Search papers, labs, and topics across Lattice.
This paper formalizes interpolating Stochastic Differential Equations (iSDEs) to encompass conditional diffusion models like SGMSE+ used in speech enhancement. It addresses the challenge of slow sampling in these models by developing a novel solver tailored for iSDEs. The proposed solver achieves high-quality speech restoration with as few as 10 neural network evaluations, significantly accelerating the sampling process.
Achieve comparable speech restoration quality with conditional diffusion models using 10x fewer neural network evaluations via a novel iSDE solver.
Diffusion Probabilistic Models (DPMs) are a well-established class of diffusion models for unconditional image generation, while SGMSE+ is a well-established conditional diffusion model for speech enhancement. One of the downsides of diffusion models is that solving the reverse process requires many evaluations of a large Neural Network. Although advanced fast sampling solvers have been developed for DPMs, they are not directly applicable to models such as SGMSE+ due to differences in their diffusion processes. Specifically, DPMs transform between the data distribution and a standard Gaussian distribution, whereas SGMSE+ interpolates between the target distribution and a noisy observation. This work first develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and second proposes a solver for iSDEs. The proposed solver enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.