Feb 15, 2026arXiv:2602.14262

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Siddhartha Raman Sundara Raman, Jaydeep P. Kulkarni

AI Summary

This paper introduces ABI, a novel near-memory GPU architecture designed for improved performance and energy efficiency across diverse workloads including CNNs, GCNs, LLMs, linear programming, and Ising models. ABI integrates a sparsity-aware near-memory circuit and a lightweight softmax circuit, achieving significant energy savings. Experimental results demonstrate 6-16x speedup and 6-13x energy savings compared to the MIAOW GPU, with ABI-enabled MI300 and Blackwell systems showing a 4.5x speedup over baseline systems.

Key Contribution

A novel GPU architecture slashes energy consumption by up to 13x and boosts speed by up to 16x across diverse workloads, including LLMs and Ising models.

Abstract

We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Related Papers