Search papers, labs, and topics across Lattice.
This paper introduces a lightweight adaptation framework for speech enhancement models using low-rank adapters on a frozen backbone, enabling efficient on-device deployment in dynamic acoustic environments. The framework is trained self-supervised and evaluated across 111 environments with varying noise types and SNRs. Results show that updating less than 1% of the base model's parameters achieves a 1.51 dB SI-SDR improvement within 20 updates per scene, with competitive or superior perceptual quality compared to existing methods.
Forget full fine-tuning: Low-rank adapters let you adapt speech enhancement models to new acoustic environments on-device, updating less than 1% of parameters for significant quality gains.
Recent studies have shown that post-deployment adaptation can improve the robustness of speech enhancement models in unseen noise conditions. However, existing methods often incur prohibitive computational and memory costs, limiting their suitability for on-device deployment. In this work, we investigate model adaptation in realistic settings with dynamic acoustic scene changes and propose a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training. Experiments on sequential scene evaluations spanning 111 environments across 37 noise types and three signal-to-noise ratio ranges, including the challenging [-8, 0] dB range, show that our method updates fewer than 1% of the base model's parameters while achieving an average 1.51 dB SI-SDR improvement within only 20 updates per scene. Compared to state-of-the-art approaches, our framework achieves competitive or superior perceptual quality with smoother and more stable convergence, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.