Search papers, labs, and topics across Lattice.
This paper provides a theoretical analysis of semi-supervised node regression using graph neural networks (GNNs) with aggregate-and-readout architectures. The authors derive a sharp non-asymptotic risk bound for least-squares estimation with linear graph convolutions and ReLU readouts, explicitly showing how performance depends on the number of labeled nodes and graph structure. They also derive approximation guarantees for graph-smoothing followed by smooth nonlinear readouts, demonstrating convergence rates that align with nonparametric behavior under full supervision and characterizing performance with limited labels.
GNN performance in semi-supervised learning can be precisely quantified with non-asymptotic risk bounds that reveal the interplay between labeled data scarcity and graph structure.
Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations.