Search papers, labs, and topics across Lattice.
This paper explores GaborNet, a Gabor filter-based front-end for raw audio processing, within RawNet2 and RawGAT-ST architectures for audio spoof detection. The study investigates modifications to GaborNet, such as squared modulus and Gaussian Lowpass Pooling, to handle complex filter outputs. Results show that GaborNet, particularly with specific modifications, can enhance the performance of RawNet2 and RawGAT-ST in detecting audio spoofing attacks, even with audio augmentations like codec conversions and additive noises.
GaborNet, a Gabor filter-based front-end for raw audio processing, significantly boosts audio spoof detection accuracy in RawNet2 and RawGAT-ST architectures.
An direction of development in the extraction of features from audio signals is based on processing raw samples in the time domain. Such an approach appears to be effective, especially in the era of neural networks. An example is SincNet. In this solution, the core of the neural network layer is a set of sinc functions that are convolved with the input signal. Due to the finite length of sinc functions, distortions appear in the frequency domain of the convolved signal, the same as in the case of windowing the signal. Recently, a new approach has been developed that uses Gabor filters to replace sinc functions. Due to the complex results, further modifications had to be applied, such as squared modulus or Gaussian Lowpass Pooling. In this work, an ingestion layer based on a bank of Gabor filters, named GaborNet, and its modifications are intensively examined within the popular RawNet2 and RawGAT- ST architectures. These have been developed for the purpose of audio spoof detection. Another issue that has been investigated was audio augmentation using codec conversions, room responses, and additive noises.