Search papers, labs, and topics across Lattice.
This paper introduces a novel framework for proper scoring of right-censored survival data, addressing the limitations of conventional scoring rules that fail under partial observation conditions. By mapping predictive distributions through the censoring mechanism, the authors derive localized and marginalized scores that maintain coherence with established right-censored likelihood methods. Experimental results demonstrate that their approach outperforms traditional plug-in weighted scores in ranking oracle forecasts and enhances multivariate survival modeling through a new learning objective called censored engression.
Right-censored survival data can now be scored accurately, avoiding the pitfalls of conventional methods that lead to ranking reversals.
Proper scoring rules provide a rigorous theoretical basis for the training and evaluation of probabilistic forecasts. However, in the presence of right censoring, the event time is only partially observed, rendering conventional scoring rules inapplicable in their standard form. We propose a framework for proper scoring of right-censored survival outcomes based on a simple idea: first, map the predictive distribution through the censoring mechanism, then apply the underlying proper score on the induced observed-data law. This yields localized scores for fixed censoring times and marginalized scores when the censoring time is random or only partially observed. The resulting construction recovers familiar right-censored likelihood and IPCW-type criteria within a coherent framework, while also yielding right-censored versions of the CRPS, pinball loss, Brier score, and energy score. We show that the marginalized score is proper under conditional independent censoring and strictly proper on the identifiable region. The same principle also leads to censored engression, a sample-based learning objective for multivariate right-censored survival modeling. In experiments, our scores correctly rank the oracle forecast across several censoring regimes, whereas forecast-dependent plug-in weighted scores can exhibit ranking reversals. Censored engression likewise substantially improves over naive training on censored outcomes.