CASJun 8, 2026arXiv:2606.09312

Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

Haolin Pan, Lianghong Huang, Xvlin Zhou, Mingjie Xing, Yanjun Wu

AI Summary

This paper introduces a world-model-inspired evaluator for tensor program optimization that utilizes action-conditioned latent dynamics to efficiently navigate the vast search space of scheduling actions. By modeling the evaluation process as a continuous latent space rollout, the method significantly reduces the measurement costs associated with traditional static code snapshots, which often overlook action dependencies. The implementation in TVM AutoScheduler yields substantial performance improvements, achieving 1.37× and 1.54× reductions in representative-subgraph latency on GPU and CPU, respectively, while also requiring 10× fewer measurements to match the performance of existing methods.

Key Contribution

Achieving over 4× acceleration in full-model inference while using 10× fewer measurements could revolutionize tensor program optimization.

Abstract

Tensor program optimization is essential for modern machine learning systems, but its search space is enormous. Existing auto-schedulers reduce measurement cost with learned cost models, yet they usually evaluate each candidate as a static code snapshot, ignoring the schedule trajectory that produced it. This makes them insensitive to action dependencies and vulnerable to superficial code variations. We propose a \emph{world-model-inspired} evaluator that models schedule evaluation as action-conditioned latent dynamics over program states. Starting from the initial program, it rolls out scheduling actions in a continuous latent space with a lightweight transition model, avoiding expensive AST mutation and repeated code encoding. The final dynamic representation is combined with action and hardware features to rank candidates. Implemented in TVM AutoScheduler, our method improves representative-subgraph latency over Ansor by 1.37$\times$ on GPU and 1.54$\times$ on CPU under the same 64-trial budget. It also matches Ansor-10K within 2.2% geometric mean using 10$\times$ fewer measurements, and accelerates full-model inference over PyTorch/PyTorch-opt(cuDNN) by 4.61$\times$/3.67$\times$ geometric mean.

Code Generation & Program Synthesis World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...