Search papers, labs, and topics across Lattice.
This paper introduces a framework for decomposing proper losses into reliability (calibration), information loss, and residual uncertainty components, making explicit the dependence of calibration on the information retained by a predictor. The decomposition identities are derived for arbitrary proper losses and information levels, with a focus on nested information levels to quantify information gain. The framework is applied to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, providing explicit forms for Brier and log-loss.
Decomposing probabilistic scores reveals exactly how much information is lost when a predictor simplifies the input data, offering a new lens for understanding calibration and model aggregation.
Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.