Search papers, labs, and topics across Lattice.
This paper introduces learnable graph patches to address the challenge of feature heterogeneity in graph data, which has hindered the transferability of graph models across different datasets. By decomposing graphs into semantic units and employing a patch encoder and aggregator, the proposed framework effectively mines transferable information across domains. Empirical results demonstrate that this approach enhances performance on various downstream tasks and benefits from increased pre-training data volume.
Learnable graph patches enable universal transferability across diverse datasets, significantly boosting downstream task performance.
In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.