Search papers, labs, and topics across Lattice.
This paper introduces an online learning framework for survival analysis using a bandit approach under the Cox Proportional Hazards (Cox PH) model. They adapt three bandit algorithms to handle challenges like staggered entry, delayed feedback, and right censoring inherent in survival data. Theoretical regret bounds are provided, and empirical results on simulated and SEER cancer data show effective learning of near-optimal treatment policies.
Optimizing treatments for time-to-event outcomes just got faster: bandit algorithms can now learn near-optimal survival analysis policies online.
Survival analysis is a widely used statistical framework for modeling time-to-event data under censoring. Classical methods, such as the Cox proportional hazards (Cox PH) model, offer a semiparametric approach to estimating the effects of covariates on the hazard function. Despite its importance, survival analysis has been largely unexplored in online settings, particularly within the bandit framework, where decisions must be made sequentially to optimize treatments as new data arrive over time. In this work, we take an initial step toward integrating survival analysis into a purely online learning setting under the Cox PH model, addressing key challenges including staggered entry, delayed feedback, and right censoring. We adapt three canonical bandit algorithms to balance exploration and exploitation, with theoretical guarantees of sublinear regret bounds. Extensive simulations and semi-real experiments using SEER cancer data demonstrate that our approach enables rapid and effective learning of near-optimal treatment policies.