Search papers, labs, and topics across Lattice.
The paper introduces ImageHD, an FPGA accelerator for energy-efficient on-device continual learning of visual representations using hyperdimensional computing (HDC). It addresses limitations of prior HDC-based CL systems by implementing a hardware-aware CL method with bounded class exemplars, a unified exemplar memory, and a hardware-efficient cluster merging strategy, alongside a quantized CNN front-end. Implemented as a streaming dataflow architecture on an AMD Zynq ZCU104 FPGA, ImageHD achieves significant speedup and energy efficiency compared to CPU and GPU baselines on the CORe50 dataset.
Edge devices can now learn continuously from visual data with 40x faster speed and 380x better energy efficiency, thanks to a novel FPGA accelerator design.
On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware. We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.