Search papers, labs, and topics across Lattice.
The paper presents an FPGA implementation of event-graph neural networks (EGNNs) for audio processing, leveraging an artificial cochlea to convert time-series signals into sparse event data. The authors achieve competitive accuracy on SHD and SSC datasets compared to state-of-the-art methods, while significantly reducing parameter count and outperforming FPGA-based spiking neural networks in accuracy and resource usage. They also demonstrate an end-to-end FPGA implementation of event-audio keyword spotting (KWS) with low latency and power consumption, establishing a benchmark for energy-efficient event-driven KWS.
Event-based graph neural networks running on FPGAs can outperform spiking neural networks in audio classification by up to 19.3% while using fewer resources and slashing latency.
As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.