Search papers, labs, and topics across Lattice.
The paper introduces PassNet, a large-scale ecosystem for LLM-based compiler pass generation, addressing performance bottlenecks in tensor compilers on long-tail workloads. PassNet comprises a dataset of 18K computational graphs and PassBench, a benchmark of 200 fusible tasks evaluated using the Error-aware Speedup Score (ES_t). Experiments show that while LLMs can outperform TorchInductor on individual subgraphs, consistency is a bottleneck, and fine-tuning a small model on PassNet trajectories significantly improves performance.
LLMs can beat state-of-the-art tensor compilers on individual subgraphs, but struggle with consistency, revealing a path to unlock their full potential through targeted training.
Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated optimization, existing efforts focus on standalone kernel generation. We argue that pass generation -- where LLMs author structured graph transformations that integrate directly into compiler pipelines -- is the more appropriate abstraction. We propose PassNet, the first large-scale ecosystem for LLM-based compiler pass generation, comprising: (1) PassNet-Dataset, over 18K unique computational graphs from 100K real-world models; and (2) PassBench, 200 curated long-tail fusible tasks (comprising 2,060 subgraphs in total) evaluated under the Error-aware Speedup Score (ES_t) -- a metric unifying correctness, stability, and performance -- with layered integrity defenses against systematic LLM exploitation. Experiments reveal that PassBench is both highly discriminative and genuinely unsaturated: the best frontier model trails TorchInductor by 37% in aggregate, yet on individual subgraphs LLMs achieve up to 3x speedup over the same compiler -- indicating that the bottleneck is consistency, not capability. Fine-tuning a small model on merely ~4K PassNet trajectories yields a 2.67x improvement approaching frontier-model performance, demonstrating substantial headroom and validating PassNet as live training infrastructure for advancing LLM-driven compiler optimization. All data, benchmarks, and tooling are publicly available.