Mar 10, 2026arXiv:2603.09505

End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments

Rui Wang, Zhifei Zhang, Yu Gao, Xiaofeng Mou, Yi Xu

AI Summary

This paper introduces an end-to-end multi-channel keyword spotting (KWS) framework that leverages spatial cues and directional priors to improve robustness in noisy environments. The system uses a spatial encoder to learn inter-channel features and a spatial embedding to incorporate directional priors, which are then processed by a streaming backbone. Experiments demonstrate that both spatial modeling and directional priors improve performance over baselines in simulated noisy conditions, with their combination yielding the best results.

Key Contribution

Spatial audio cues and directional priors can be jointly learned end-to-end to significantly boost keyword spotting accuracy in noisy environments, outperforming traditional cascaded approaches.

Abstract

Keyword spotting (KWS) is crucial for many speech-driven applications, but robust KWS in noisy environments remains challenging. Conventional systems often rely on single-channel inputs and a cascaded pipeline separating front-end enhancement from KWS. This precludes joint optimization, inherently limiting performance. We present an end-to-end multi-channel KWS framework that exploits spatial cues to improve noise robustness. A spatial encoder learns inter-channel features, while a spatial embedding injects directional priors; the fused representation is processed by a streaming backbone. Experiments in simulated noisy conditions across multiple signal-to-noise ratios (SNRs) show that spatial modeling and directional priors each yield clear gains over baselines, with their combination achieving the best results. These findings validate end-to-end multi-channel spatial modeling, indicating strong potential for the target-speaker-aware detection in complex acoustic scenarios.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments

Related Papers