Tsinghua AIHKUHUSTSoochowUTokyoWHUFeb 13, 2026arXiv:2602.12783

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Yuejie Li, Yueying Hua, Berlin Chen, Berlin Chen, Jianhao Nie, Jianhao Nie, Yueping He, Caixin Kang

AI Summary

The paper introduces SQuTR, a new benchmark for spoken query-to-text retrieval designed to evaluate robustness under realistic acoustic noise conditions. SQuTR comprises a large-scale dataset synthesized from 37,317 queries across six text retrieval datasets, using 200 voice profiles and 17 noise categories at varying SNR levels. Experiments on cascaded and end-to-end retrieval systems reveal significant performance degradation with increasing noise, highlighting robustness as a key challenge in spoken query retrieval.

Key Contribution

Retrieval models, even large ones, struggle under realistic acoustic noise, as revealed by the new SQuTR benchmark.

Abstract

Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Speech & Audio

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Related Papers