Mar 12, 2026arXiv:2603.12037

Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference

Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan

AI Summary

This paper analyzes the frequentist consistency of prior-data fitted networks (PFNs) for estimating the average treatment effect (ATE) in causal inference. It demonstrates that existing PFNs, when viewed as Bayesian ATE estimators, can suffer from prior-induced confounding bias, preventing frequentist consistency. To address this, the authors propose a calibration procedure based on a one-step posterior correction (OSPC) using martingale posteriors, proving that it restores consistency and yields a semi-parametric Bernstein-von Mises theorem.

Key Contribution

PFNs, despite their strong empirical performance in causal inference, can suffer from prior-induced confounding bias, but a simple one-step posterior correction can restore frequentist consistency.

Abstract

Foundation models based on prior-data fitted networks (PFNs) have shown strong empirical performance in causal inference by framing the task as an in-context learning problem.However, it is unclear whether PFN-based causal estimators provide uncertainty quantification that is consistent with classical frequentist estimators. In this work, we address this gap by analyzing the frequentist consistency of PFN-based estimators for the average treatment effect (ATE). (1) We show that existing PFNs, when interpreted as Bayesian ATE estimators, can exhibit prior-induced confounding bias: the prior is not asymptotically overwritten by data, which, in turn, prevents frequentist consistency. (2) As a remedy, we suggest employing a calibration procedure based on a one-step posterior correction (OSPC). We show that the OSPC helps to restore frequentist consistency and can yield a semi-parametric Bernstein-von Mises theorem for calibrated PFNs (i.e., both the calibrated PFN-based estimators and the classical semi-parametric efficient estimators converge in distribution with growing data size). (3) Finally, we implement OSPC through tailoring martingale posteriors on top of the PFNs. In this way, we are able to recover functional nuisance posteriors from PFNs, required by the OSPC. In multiple (semi-)synthetic experiments, PFNs calibrated with our martingale posterior OSPC produce ATE uncertainty that (i) asymptotically matches frequentist uncertainty and (ii) is well calibrated in finite samples in comparison to other Bayesian ATE estimators.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference

Related Papers