UWAI for GoodDepartment of Biochemistry Institute for ProteinDrive.Institute for Protein DesignUofTNov 19, 2025arXiv:2511.15915

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang, Genghan Zhang, Shaowei Zhu, Shaowei Zhu, Anjiang Wei, Anjiang Wei, Zhenyu Song, Zhenyu Song, Allen Nie, Allen Nie, Zhen Jia, Zhen Jia, Nandita Vijaykumar, Nandita Vijaykumar, Yida Wang, Yida Wang, K. Olukotun, Kunle Olukotun

AI Summary

AccelOpt, a self-improving LLM agentic system, automates AI accelerator kernel optimization by iteratively generating and evaluating kernels based on past optimization experiences. Evaluated on NKIBench, a new benchmark suite of AWS Trainium kernels, AccelOpt improves average peak throughput from 49% to 61% on Trainium 1 and 45% to 59% on Trainium 2. Remarkably, AccelOpt achieves comparable kernel improvements to Claude Sonnet 4 at 1/26th the cost, demonstrating the potential of open-source LLMs for hardware optimization.

Key Contribution

Open-source LLMs can now autonomously optimize AI accelerator kernels, matching the performance of proprietary models at a fraction of the cost.

Abstract

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from 49% to 61% on Trainium 1 and from 45% to 59% on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being 26times cheaper. The code is open-sourced at https://github.com/zhang677/AccelOpt.

Code Generation & Program Synthesis Distributed Systems & Hardware Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Related Papers