Mar 4, 2026arXiv:2603.03880

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

O. Krestinskaya, Olga Krestinskaya, M. E. Fouda, Mohammed E. Fouda, Ahmed Eltawil, A. Eltawil, Khaled N. Salama

AI Summary

This paper introduces a joint hardware-workload co-optimization framework for in-memory computing (IMC) accelerators using an evolutionary algorithm. The framework addresses the challenge of designing generalized IMC architectures that perform well across multiple neural network workloads, unlike existing methods that focus on single-workload optimization. Results on RRAM- and SRAM-based IMC architectures demonstrate significant energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small and large set of workloads, respectively, compared to baseline methods.

Key Contribution

Stop designing specialized in-memory computing (IMC) hardware for single workloads: a new co-optimization framework slashes energy-delay-area product (EDAP) by up to 95.5% when generalizing across multiple neural networks.

Abstract

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

Related Papers