HITSJTUMar 11, 2026arXiv:2603.10846

Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

Yujie Zheng, Zhuo Li, Sheng Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan, Weinan Zhang, Ying Wen, Boxuan Tang, Muning Wen

AI Summary

EvoKernel, a self-evolving agentic framework, is introduced to address the challenge of kernel synthesis on data-scarce Domain-Specific Architectures like NPUs by automating the lifecycle from initial drafting to continual refinement. It formulates the synthesis process as a memory-based reinforcement learning task with a novel value-driven retrieval mechanism that learns stage-specific Q-values. Evaluated on an NPU variant of KernelBench, EvoKernel improves correctness from 11.0% to 83.0% and achieves a median speedup of 3.60x over initial drafts, demonstrating the effectiveness of value-guided experience accumulation on niche hardware.

Key Contribution

LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.

Abstract

Deploying Large Language Models to data-scarce programming domains poses significant challenges, particularly for kernel synthesis on emerging Domain-Specific Architectures where a"Data Wall"limits available training data. While models excel on data-rich platforms like CUDA, they suffer catastrophic performance drops on data-scarce ecosystems such as NPU programming. To overcome this cold-start barrier without expensive fine-tuning, we introduce EvoKernel, a self-evolving agentic framework that automates the lifecycle of kernel synthesis from initial drafting to continual refining. EvoKernel addresses this by formulating the synthesis process as a memory-based reinforcement learning task. Through a novel value-driven retrieval mechanism, it learns stage-specific Q-values that prioritize experiences based on their contribution to the current objective, whether bootstrapping a feasible draft or iteratively refining latency. Furthermore, by enabling cross-task memory sharing, the agent generalizes insights from simple to complex operators. By building an NPU variant of KernelBench and evaluating on it, EvoKernel improves frontier models'correctness from 11.0% to 83.0% and achieves a median speedup of 3.60x over initial drafts through iterative refinement. This demonstrates that value-guided experience accumulation allows general-purpose models to master the kernel synthesis task on niche hardware ecosystems. Our official page is available at https://evokernel.zhuo.li.

Code Generation & Program Synthesis Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

Related Papers