Embedded Systems LabApr 21, 2026arXiv:2604.19293

Energy Efficient LSTM Accelerators for Embedded FPGAs Through Parameterised Architecture Design

Chao Qian, Tianheng Ling, Gregor Schiele

AI Summary

This paper introduces a parameterizable hardware accelerator for LSTMs optimized for embedded FPGAs, targeting on-device time series analysis. The design allows for adjusting parameters like DSP usage and activation function implementation to improve execution speed and reduce energy consumption. Evaluation shows the accelerator achieves an energy efficiency of 11.89 GOP/s/W during real-time inference.

Key Contribution

Achieve LSTM acceleration on embedded FPGAs with 11.89 GOP/s/W energy efficiency by tuning architectural parameters.

Abstract

Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations7

Influential citations0

References19

Year2026

VenueARCS

Related Papers

Finding related papers...

Search

Energy Efficient LSTM Accelerators for Embedded FPGAs Through Parameterised Architecture Design

Related Papers