Apr 21, 2026arXiv:2604.19028

Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors

Jeongwhan Choi, Jongwoo Kim, Woo-Chang Kang, Noseong Park

AI Summary

NodePFN, a novel approach to node classification, learns posterior predictive distributions (PPDs) by pre-training on thousands of synthetic graphs generated from priors designed to mimic real-world graph properties. This allows the model to generalize to arbitrary graphs without graph-specific training, overcoming the limitations of traditional GNNs. The method employs a dual-branch architecture with context-query attention and local message passing, achieving 71.27% average accuracy across 23 benchmarks.

Key Contribution

Forget training a GNN for every new graph: NodePFN learns universal node classification from synthetic graph priors, generalizing across diverse datasets without graph-specific training.

Abstract

One of the most challenging problems in graph machine learning is generalizing across graphs with diverse properties. Graph neural networks (GNNs) face a fundamental limitation: they require separate training for each new graph, preventing universal generalization across diverse graph datasets. A critical challenge facing GNNs lies in their reliance on labeled training data for each individual graph, a requirement that hinders the capacity for universal node classification due to the heterogeneity inherent in graphs -- differences in homophily levels, community structures, and feature distributions across datasets. Inspired by the success of large language models (LLMs) that achieve in-context learning through massive-scale pre-training on diverse datasets, we introduce NodePFN. This universal node classification method generalizes to arbitrary graphs without graph-specific training. NodePFN learns posterior predictive distributions (PPDs) by training only on thousands of synthetic graphs generated from carefully designed priors. Our synthetic graph generation covers real-world graphs through the use of random networks with controllable homophily levels and structural causal models for complex feature-label relationships. We develop a dual-branch architecture combining context-query attention mechanisms with local message passing to enable graph-aware in-context learning. Extensive evaluation on 23 benchmarks demonstrates that a single pre-trained NodePFN achieves 71.27 average accuracy. These results validate that universal graph learning patterns can be effectively learned from synthetic priors, establishing a new paradigm for generalization in node classification.

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors

Related Papers