Search papers, labs, and topics across Lattice.
This paper addresses learning cost-optimal sequential testing policies from retrospective data with informative missingness in clinical decision-making. They develop a doubly robust Q-learning framework using path-specific inverse probability weights and auxiliary contrast models to handle heterogeneous test trajectories. The method achieves unbiased policy learning when either the acquisition model or the contrast model is correctly specified, demonstrated through oracle inequalities, convergence rates, and an application to prostate cancer cohort study.
Reduce testing costs without compromising predictive accuracy by learning cost-optimal sequential decision policies from retrospective data, even with informative missingness.
Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.