DTUApr 7, 2026arXiv:2604.05613

Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

Laurits Fredsgaard, Laurits Fredsgaard, Aaron Thomas, Michael Riis Andersen, Michael Riis Andersen, Mikkel N. Schmidt, Mikkel N. Schmidt, Mahito Sugiyama, Mahito Sugiyama

AI Summary

The paper investigates the calibration of autoregressive graph generators, which define likelihoods based on sequential graph construction. It introduces "Linearization Uncertainty" (LU), a metric quantifying the variance in negative log-likelihood (NLL) across equivalent graph linearizations, to assess whether models learn the underlying graph structure or just the training linearization. Experiments on QM9 show that models trained with biased linearizations exhibit high ECE under permutation and that LU correlates better with molecular stability than NLL, highlighting the importance of permutation-based evaluation.

Key Contribution

Autoregressive graph generators aren't learning the graph structure, they're memorizing your arbitrary traversal order.

Abstract

Autoregressive graph generators define likelihoods via a sequential construction process, but these likelihoods are only meaningful if they are consistent across all linearizations of the same graph. Segmented Eulerian Neighborhood Trails (SENT), a recent linearization method, converts graphs into sequences that can be perfectly decoded and efficiently processed by language models, but admit multiple equivalent linearizations of the same graph. We quantify violations in assigned negative log-likelihood (NLL) using the coefficient of variation across equivalent linearizations, which we call Linearization Uncertainty (LU). Training transformers under four linearization strategies on two datasets, we show that biased orderings achieve lower NLL on their native order but exhibit expected calibration error (ECE) two orders of magnitude higher under random permutation, indicating that these models have learned their training linearization rather than the underlying graph. On the molecular graph benchmark QM9, NLL for generated graphs is negatively correlated with molecular stability (AUC $=0.43$), while LU achieves AUC $=0.85$, suggesting that permutation-based evaluation provides a more reliable quality check for generated molecules. Code is available at https://github.com/lauritsf/linearization-uncertainty

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

Related Papers