Search papers, labs, and topics across Lattice.
EvoKernel, a self-evolving agentic framework, is introduced to address the challenge of kernel synthesis on data-scarce Domain-Specific Architectures like NPUs by automating the lifecycle from initial drafting to continual refinement. It formulates the synthesis process as a memory-based reinforcement learning task with a novel value-driven retrieval mechanism that learns stage-specific Q-values. Evaluated on an NPU variant of KernelBench, EvoKernel improves correctness from 11.0% to 83.0% and achieves a median speedup of 3.60x over initial drafts, demonstrating the effectiveness of value-guided experience accumulation on niche hardware.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
Deploying Large Language Models to data-scarce programming domains poses significant challenges, particularly for kernel synthesis on emerging Domain-Specific Architectures where a"Data Wall"limits available training data. While models excel on data-rich platforms like CUDA, they suffer catastrophic performance drops on data-scarce ecosystems such as NPU programming. To overcome this cold-start barrier without expensive fine-tuning, we introduce EvoKernel, a self-evolving agentic framework that automates the lifecycle of kernel synthesis from initial drafting to continual refining. EvoKernel addresses this by formulating the synthesis process as a memory-based reinforcement learning task. Through a novel value-driven retrieval mechanism, it learns stage-specific Q-values that prioritize experiences based on their contribution to the current objective, whether bootstrapping a feasible draft or iteratively refining latency. Furthermore, by enabling cross-task memory sharing, the agent generalizes insights from simple to complex operators. By building an NPU variant of KernelBench and evaluating on it, EvoKernel improves frontier models'correctness from 11.0% to 83.0% and achieves a median speedup of 3.60x over initial drafts through iterative refinement. This demonstrates that value-guided experience accumulation allows general-purpose models to master the kernel synthesis task on niche hardware ecosystems. Our official page is available at https://evokernel.zhuo.li.