Search papers, labs, and topics across Lattice.
This paper introduces SB-RF, a novel one-step generative framework for speech enhancement that combines Rectified Flow (RF) with Schrödinger Bridge (SB) theory to improve the quality of enhanced speech. By utilizing entropy-regularized optimal transport to create a conditional bridge between clean and noisy speech distributions, SB-RF achieves superior performance on the VoiceBank-DEMAND benchmark. The results indicate that SB-RF not only enhances speech quality effectively but also demonstrates robustness in challenging low signal-to-noise ratio scenarios, making it suitable for practical applications.
SB-RF achieves leading performance in speech enhancement with a one-step generative approach that outperforms traditional multi-step methods.
Generative models have shown impressive results in speech enhancement but often suffer from multi-step inference. We propose SB-RF, a one-step generative framework integrating Rectified Flow (RF) with Schrödinger Bridge (SB) theory. SB-RF constructs a conditional bridge between clean and noisy speech distributions via entropy-regularized optimal transport. By aligning SB trajectories with the optimal transport geodesic through the velocity-matching objective of RF, SB-RF enables high-quality enhancement with one-step generation. Experiments demonstrate that SB-RF achieves leading performance among generative methods on the VoiceBank-DEMAND benchmark. Furthermore, to fully assess performance in challenging real-world scenarios, we evaluate SB-RF on a simulated low signal-to-noise ratio test set using an expanded training dataset. Under these conditions, SB-RF exhibits strong and competitive robustness with high efficiency, validating its potential for real-world applications.