Search papers, labs, and topics across Lattice.
The paper introduces SELEBI, a novel phase vocoder algorithm for time-stretching audio that reduces percussion smearing artifacts. SELEBI addresses the time-scale mismatch between smeared magnitude spectrograms and localized phase by using a nonstationary Gabor transform to dynamically adapt analysis window lengths based on signal energy, thereby localizing the magnitude spectrogram around percussive events. Experiments demonstrate that SELEBI effectively mitigates percussion smearing while maintaining natural sound quality due to the perfect reconstruction property of the nonstationary Gabor transform.
Time-stretching audio without the dreaded "percussion smearing" is now possible, thanks to a signal-adaptive phase vocoder that dynamically adjusts analysis windows.
Phase vocoder-based time-stretching is a widely used technique for the time-scale modification of audio signals. However, conventional implementations suffer from ``percussion smearing,''a well-known artifact that significantly degrades the quality of percussive components. We attribute this artifact to a fundamental time-scale mismatch between the temporally smeared magnitude spectrogram and the localized, newly generated phase. To address this, we propose SELEBI, a signal-adaptive phase vocoder algorithm that significantly reduces percussion smearing while preserving stability and the perfect reconstruction property. Unlike conventional methods that rely on heuristic processing or component separation, our approach leverages the nonstationary Gabor transform. By dynamically adapting analysis window lengths to assign short windows to intervals containing significant energy associated with percussive components, we directly compute a temporally localized magnitude spectrogram from the time-domain signal. This approach ensures greater consistency between the temporal structures of the magnitude and phase. Furthermore, the perfect reconstruction property of the nonstationary Gabor transform guarantees stable, high-fidelity signal synthesis, in contrast to previous heuristic approaches. Experimental results demonstrate that the proposed method effectively mitigates percussion smearing and yields natural sound quality.