JHUUPennFeb 26, 2026arXiv:2602.23360

Model Agreement via Anchoring

Eric Eaton, Eric Eaton, Surbhi Goel, Surbhi Goel, Marcel Hussing, Marcel Hussing, Michael Kearns, Michael Kearns, Aaron Roth, Aaron Roth, S. Sengupta, Sikata Bela Sengupta, Jessica Sorrell, Jessica Sorrell

AI Summary

This paper introduces an "anchoring" technique for bounding model disagreement, defined as the expected squared difference in predictions between independently trained models. The technique involves analyzing disagreement by anchoring to the average of two models. They then apply this technique to derive disagreement bounds that converge to zero for stacked aggregation, gradient boosting, neural network architecture search, and regression tree training, with respect to their respective parameters (number of models, iterations, architecture size, tree depth).

Key Contribution

Zero model disagreement is achievable with simple techniques like stacking, boosting, and architecture search, converging as you increase the number of models, iterations, or architecture size.

Abstract

Numerous lines of aim to control $\textit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and standard notion of model disagreement in real-valued prediction problems, namely the expected squared difference in predictions between two models trained on independent samples, without any coordination of the training processes. We would like to be able to drive disagreement to zero with some natural parameter(s) of the training procedure using analyses that can be applied to existing training methodologies. We develop a simple general technique for proving bounds on independent model disagreement based on $\textit{anchoring}$ to the average of two models within the analysis. We then apply this technique to prove disagreement bounds for four commonly used machine learning algorithms: (1) stacked aggregation over an arbitrary model class (where disagreement is driven to 0 with the number of models $k$ being stacked) (2) gradient boosting (where disagreement is driven to 0 with the number of iterations $k$) (3) neural network training with architecture search (where disagreement is driven to 0 with the size $n$ of the architecture being optimized over) and (4) regression tree training over all regression trees of fixed depth (where disagreement is driven to 0 with the depth $d$ of the tree architecture). For clarity, we work out our initial bounds in the setting of one-dimensional regression with squared error loss -- but then show that all of our results generalize to multi-dimensional regression with any strongly convex loss.

Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References67

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Model Agreement via Anchoring

Related Papers