Mar 18, 2026arXiv:2603.17569

Gaussian Process Limit Reveals Structural Benefits of Graph Transformers

Nil Ayday, Lingchu Yang, Debarghya Ghoshdastidar

AI Summary

This paper analyzes the neural network Gaussian process (NNGP) limits of graph transformers (GAT, Graphormer, Specformer) with infinite width and infinite heads to understand their structural advantages over graph convolutional networks (GCNs) for node-level prediction. The authors derive node-level and edge-level kernels across layers, characterizing how node features and graph structure propagate through graph attention layers. A key finding is that graph transformers structurally preserve community information and maintain discriminative node representations in deep layers, preventing oversmoothing, which is validated empirically.

Key Contribution

Graph transformers avoid oversmoothing in deep layers by structurally preserving community information, a theoretical advantage over GCNs revealed through Gaussian process limits.

Abstract

Graph transformers are the state-of-the-art for learning from graph-structured data and are empirically known to avoid several pitfalls of message-passing architectures. However, there is limited theoretical analysis on why these models perform well in practice. In this work, we prove that attention-based architectures have structural benefits over graph convolutional networks in the context of node-level prediction tasks. Specifically, we study the neural network gaussian process limits of graph transformers (GAT, Graphormer, Specformer) with infinite width and infinite heads, and derive the node-level and edge-level kernels across the layers. Our results characterise how the node features and the graph structure propagate through the graph attention layers. As a specific example, we prove that graph transformers structurally preserve community information and maintain discriminative node representations even in deep layers, thereby preventing oversmoothing. We provide empirical evidence on synthetic and real-world graphs that validate our theoretical insights, such as integrating informative priors and positional encoding can improve performance of deep graph transformers.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Gaussian Process Limit Reveals Structural Benefits of Graph Transformers

Related Papers