Tsinghua AIHangzhou Dianzi UniversityMay 28, 2026arXiv:2605.29357

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Yiqun Liu, Yiqun Liu, Yingsheng Wu, Yingsheng Wu, Ruqi Yang, Enrong Zheng, Enrong Zheng, Honglei Qiu, Honglei Qiu, Sijun He, Sijun He, Tai Liang, Tai Liang, Jingjing Wu, Yuhan Zhou, Yuhang Zhou, Yiwei Zhang, Yiwei Zhang, Dongyan Chen, Weihan Yi, Weihan Yi, Xinqi Li, Xinqi Li, Siqi Bao, Siqi Bao

AI Summary

The paper introduces PassNet, a large-scale ecosystem for LLM-based compiler pass generation, addressing performance bottlenecks in tensor compilers on long-tail workloads. PassNet comprises a dataset of 18K computational graphs and PassBench, a benchmark of 200 fusible tasks evaluated using the Error-aware Speedup Score (ES_t). Experiments show that while LLMs can outperform TorchInductor on individual subgraphs, consistency is a bottleneck, and fine-tuning a small model on PassNet trajectories significantly improves performance.

Key Contribution

LLMs can beat state-of-the-art tensor compilers on individual subgraphs, but struggle with consistency, revealing a path to unlock their full potential through targeted training.

Abstract

Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated optimization, existing efforts focus on standalone kernel generation. We argue that pass generation -- where LLMs author structured graph transformations that integrate directly into compiler pipelines -- is the more appropriate abstraction. We propose PassNet, the first large-scale ecosystem for LLM-based compiler pass generation, comprising: (1) PassNet-Dataset, over 18K unique computational graphs from 100K real-world models; and (2) PassBench, 200 curated long-tail fusible tasks (comprising 2,060 subgraphs in total) evaluated under the Error-aware Speedup Score (ES_t) -- a metric unifying correctness, stability, and performance -- with layered integrity defenses against systematic LLM exploitation. Experiments reveal that PassBench is both highly discriminative and genuinely unsaturated: the best frontier model trails TorchInductor by 37% in aggregate, yet on individual subgraphs LLMs achieve up to 3x speedup over the same compiler -- indicating that the bottleneck is consistency, not capability. Fine-tuning a small model on merely ~4K PassNet trajectories yields a 2.67x improvement approaching frontier-model performance, demonstrating substantial headroom and validating PassNet as live training infrastructure for advancing LLM-driven compiler optimization. All data, benchmarks, and tooling are publicly available.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Related Papers