Search papers, labs, and topics across Lattice.
The paper introduces DARTH-PUM, a novel hybrid processing-using-memory (PUM) architecture that combines analog and Boolean PUM within the same memory array to enable general-purpose computation. It addresses hardware and software challenges by proposing optimized peripheral circuitry, coordinating hardware for managing both PUM types, a user-friendly programming interface, and flexible data width support. Experimental results demonstrate significant speedups in AES encryption, CNNs, and LLMs compared to an analog+CPU baseline, showcasing the potential of DARTH-PUM for diverse applications.
Analog in-memory computing breaks free from specialized ML inference by integrating with Boolean PUM, enabling up to 59x speedups on general-purpose kernels like AES encryption.
Analog processing-using-memory (PUM; a.k.a. in-memory computing) makes use of electrical interactions inside memory arrays to perform bulk matrix-vector multiplication (MVM) operations. However, many popular matrix-based kernels need to execute non-MVM operations, which analog PUM cannot directly perform. To retain its energy efficiency, analog PUM architectures augment memory arrays with CMOS-based domain-specific fixed-function hardware to provide complete kernel functionality, but the difficulty of integrating such specialized CMOS logic with memory arrays has largely limited analog PUM to being an accelerator for machine learning inference, or for closely related kernels. An opportunity exists to harness analog PUM for general-purpose computation: recent works have shown that memory arrays can also perform Boolean PUM operations, albeit with very different supporting hardware and electrical signals than analog PUM. We propose DARTH-PUM, a general-purpose hybrid PUM architecture that tackles key hardware and software challenges to integrating analog PUM and digital PUM. We propose optimized peripheral circuitry, coordinating hardware to manage and interface between both types of PUM, an easy-to-use programming interface, and low-cost support for flexible data widths. These design elements allow us to build a practical PUM architecture that can execute kernels fully in memory, and can scale easily to cater to domains ranging from embedded applications to large-scale data-driven computing. We show how three popular applications (AES encryption, convolutional neural networks, large-language models) can map to and benefit from DARTH-PUM, with speedups of 59.4x, 14.8x, and 40.8x over an analog+CPU baseline.