Mar 31, 2026arXiv:2603.29913

SISA: A Scale-In Systolic Array for GEMM Acceleration

Luigi Altamura, Alessio Cicero, Mateo Vázquez Maceiras, Mohammad Ali Maleki, Pedro Trancoso

AI Summary

This paper introduces SISA, a systolic array architecture that partitions a traditional square array into independently scheduled horizontal rectangular slabs to address the underutilization of processing elements (PEs) caused by input-dependent and skewed matrices in LLMs. SISA enables efficient execution of small or skewed matrix shapes by exposing parallelism through these independently scheduled slabs, while still supporting full-array operation for large GEMMs. Experimental results demonstrate that SISA achieves up to 8.52x speedup and 93% energy-delay-product (EDP) reduction compared to a monolithic SA with the same number of PEs when running representative LLMs.

Key Contribution

LLMs' skewed matrix shapes need not hamstring systolic array performance: SISA's partitioned architecture achieves up to 8.52x speedup and 93% EDP reduction compared to monolithic arrays.

Abstract

The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. Thus, most systems are equipped with dedicated matrix hardware accelerators based on square Systolic Arrays (SAs) of Processing Elements (PEs). While this organization was effective for traditional Deep Neural Networks (DNNs), LLMs introduce input-dependent and highly skewed matrices, leading to underutilized SA resources. To address this challenge, we propose SISA (Scale-In Systolic Array), a novel SA architecture that partitions the traditional square array into horizontal rectangular slabs. With minimal overhead, SISA exposes parallelism through independently scheduled slabs for efficient execution of small or skewed matrix shapes, while retaining full-array operation for large GEMMs. SISA achieves up to 8.52x speedup and 93% energy-delay-product (EDP) reduction for representative LLMs compared to a state-of-the-art monolithic SA with the same number of PEs.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SISA: A Scale-In Systolic Array for GEMM Acceleration

Related Papers