Mar 16, 2026arXiv:2603.14988

bitSMM: A bit-Serial Matrix Multiplication Accelerator

AI Summary

This paper introduces bitSMM, a bit-serial matrix multiplication accelerator designed for power and reliability constraints in spacecraft-based neural network inference. The accelerator features a systolic array of bit-serial multiply-accumulate (MAC) units with runtime-configurable operand precision (1-16 bits). Evaluated on both FPGA (AMD ZCU104) and ASIC (asap7, nangate45), bitSMM achieves up to 19.2 GOPS and 2.973 GOPS/W on FPGA, and up to 73.22 GOPS, 552 GOPS/mm$^2$, and 40.8 GOPS/W in asap7.

Key Contribution

For spacecraft-based neural networks, bit-serial matrix multiplication offers a compelling path to high GOPS/W within stringent power and reliability constraints.

Abstract

Neural-network (NN) inference is increasingly present on-board spacecraft to reduce downlink bandwidth and enable timely decision making. However, the power and reliability constraints of space missions limit the applicability of many state-of-the-art NN accelerators. This paper presents bitSMM, a bit-serial matrix multiplication accelerator built around a systolic array of bit-serial multiply--accumulate (MAC) units. The design supports runtime-configurable operand precision from 1 to 16 bits and evaluates two MAC variants: a Booth-inspired architecture and a standard binary multiplication with correction architecture. We implement bitSMM in [System]Verilog and evaluate it on an AMD ZCU104 FPGA and through ASIC physical implementation using the asap7 and nangate45 process design kits. On the FPGA, bitSMM achieves up to 19.2~GOPS and 2.973~GOPS/W, and in asap7 it achieves up to 73.22~GOPS, 552~GOPS/mm$^2$, and 40.8~GOPS/W.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

bitSMM: A bit-Serial Matrix Multiplication Accelerator

Related Papers