CEAKeioThe AGH University of KrakowFeb 18, 2026arXiv:2602.16442

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA

K. Jeziorek, Kamil Jeziorek, Piotr Wzorek, Piotr Wzorek, Krzysztof Blachut, Krzysztof Błachut, Hiroshi Nakano, Hiroshi Nakano, Manon Dampfhoffer, Manon Dampfhoffer, Thomas Mesquida, Thomas Mesquida, Hiroaki Nishi, Hiroaki Nishi, Thomas Dalgaty, Thomas Dalgaty, Tomasz Kryjak, Tomasz Kryjak

AI Summary

The paper presents an FPGA implementation of event-graph neural networks (EGNNs) for audio processing, leveraging an artificial cochlea to convert time-series signals into sparse event data. The authors achieve competitive accuracy on SHD and SSC datasets compared to state-of-the-art methods, while significantly reducing parameter count and outperforming FPGA-based spiking neural networks in accuracy and resource usage. They also demonstrate an end-to-end FPGA implementation of event-audio keyword spotting (KWS) with low latency and power consumption, establishing a benchmark for energy-efficient event-driven KWS.

Key Contribution

Event-based graph neural networks running on FPGAs can outperform spiking neural networks in audio classification by up to 19.3% while using fewer resources and slashing latency.

Abstract

As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Speech & Audio

Citation Metrics

Citations0

Influential citations0

References66

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA

Related Papers