Search papers, labs, and topics across Lattice.
The paper introduces a methodology for implementing real-time few-shot learning on resource-constrained FPGA SoCs using arbitrary fixed-point bit-widths, overcoming limitations of existing design environments like Tensil. By adopting the FINN framework and optimizing transpose nodes and the final reduce mean operation (converting it to GAP), the authors enable bit-width reduction without sacrificing accuracy. Experimental results on CIFAR-10 demonstrate approximately doubled throughput compared to conventional 16/32-bit implementations.
Squeezing few-shot learning onto tiny FPGAs just got easier: this work unlocks arbitrary bit-width quantization, doubling throughput on CIFAR-10 without accuracy loss.
In this study, we propose an implementation methodology of real-time few-shot learning on tiny FPGA SoCs such as the PYNQ-Z1 board with arbitrary fixed-point bit-widths. Tensil-based conventional design environments limited hardware implementations to fixed-point bit-widths of 16 or 32 bits. To address this, we adopt the FINN framework, enabling implementations with arbitrary bit-widths. Several customizations and minor adjustments are made, including: 1.Optimization of Transpose nodes to resolve data format mismatches, 2.Addition of handling for converting the final reduce mean operation to Global Average Pooling (GAP). These adjustments allow us to reduce the bit-width while maintaining the same accuracy as the conventional realization, and achieve approximately twice the throughput in evaluations using CIFAR-10 dataset.