Stanford HAIHarvardApr 13, 2026arXiv:2604.11165

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Doudou Zhou, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai

AI Summary

This paper addresses learning cost-optimal sequential testing policies from retrospective data with informative missingness in clinical decision-making. They develop a doubly robust Q-learning framework using path-specific inverse probability weights and auxiliary contrast models to handle heterogeneous test trajectories. The method achieves unbiased policy learning when either the acquisition model or the contrast model is correctly specified, demonstrated through oracle inequalities, convergence rates, and an application to prostate cancer cohort study.

Key Contribution

Reduce testing costs without compromising predictive accuracy by learning cost-optimal sequential decision policies from retrospective data, even with informative missingness.

Abstract

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

Recommendation & Information Retrieval Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Related Papers