Search papers, labs, and topics across Lattice.
This paper introduces Axon, a synthesizing superoptimizer designed to automate the generation of high-performance tensor programs for AI accelerators. By leveraging program synthesis and exploring semantically equivalent program variants, Axon empirically identifies optimal kernels while ensuring semantic preservation through SMT techniques. The key finding is that Axon can effectively discover algebraic transformations and optimize memory traffic, significantly reducing the programming burden on developers.
Axon can automatically synthesize high-performance tensor programs, drastically simplifying the optimization process for AI accelerators.
Writing high performance kernels for AI accelerators requires deep expertise in tiling, instruction selection, data layout, and operator fusion placing a significant burden on programmers. In this paper, we focus on tile based AI accelerator programs and present Axon, a synthesizing superoptimizer for tensor programs: it uses program synthesis to automatically generate target instructions from semantics specifications, and explores semantically equivalent program variants to select the best performing kernel empirically. Axon discovers algebraic transformations by propagating operators through computation graphs and uses SMT over unbounded tensors to guarantee that all transformations preserve semantics without requiring hand crafted rewrite rules. It then lowers tensor operations to target ISA instructions, explores tiling configurations constrained by hardware descriptions, and fuses operators and instructions to minimize memory traffic.