NUSSJTUMay 6, 2026arXiv:2605.05023

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

AI Summary

CuBridge is introduced, an LLM-based framework that adapts expert-written CUDA attention kernels to support diverse attention variants. It uses a lift-transfer-lower workflow, lifting kernels into an intermediate representation, generating a target IR program from a PyTorch specification, and reconstructing optimized CUDA code. Experiments show CuBridge consistently produces correct kernels and outperforms general frameworks, compiler-based approaches, and prior LLM-based methods across various attention variants and GPU platforms.

Key Contribution

LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.

Abstract

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention. We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Related Papers