Feb 18, 2026arXiv:2602.16399

Multi-Channel Replay Speech Detection using Acoustic Maps

AI Summary

This paper introduces acoustic maps, a spatial feature representation derived from beamforming over azimuth and elevation grids, for replay speech detection in multi-channel recordings. The acoustic maps encode directional energy distributions, capturing the physical differences between human speech and loudspeaker replay. A lightweight CNN operating on acoustic maps achieves competitive performance on the ReMASC dataset, demonstrating the feature space's compactness and interpretability for replay attack detection.

Key Contribution

Acoustic maps offer a compact and physically interpretable feature space that allows lightweight CNNs to effectively detect replay attacks, even across diverse devices and acoustic environments.

Abstract

Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.

Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Channel Replay Speech Detection using Acoustic Maps

Related Papers