IIT MadrasMay 28, 2026arXiv:2605.30275

Digitally enriching a screening population for pancreatic cancer using routine blood-based measures and clinical histories

C. Varghese, L. Y. Li-Han, Richa Bisht, Ellen L. Larson, Frank Lee, Ryan M. Carr, T. Bekaii-Saab, Shounak Majumder, John D Halamka, M. Truty, Ajit H. Goenka, Hojjat Salehinejad, Cornelius A. Thiels

AI Summary

This paper introduces a Transformer-based neural network that leverages longitudinal clinical histories and blood test data to predict pancreatic cancer risk with multi-year lead times. The model, trained on a large cohort of over 180,000 individuals, risk-stratifies populations for targeted screening, achieving an AUC of 0.837, 0.797, and 0.760 for 1-, 2-, and 3-year prediction horizons, respectively. The risk predictions are well-calibrated and transportable across settings, enabling a digital enrichment tool for population-level pancreatic cancer screening.

Key Contribution

A Transformer trained on routine blood tests and clinical histories can predict pancreatic cancer years before diagnosis, opening the door to effective population-level screening.

Abstract

Earlier detection of pancreatic cancer is key to enabling wider access to curative treatment and reducing cancer deaths; however, screening is presently not viable. Latent indicators of pathology are evident in an individual's disease and blood test trajectories and may predict the development of pancreatic cancer. Longitudinal sequences of coded diagnoses and blood test values accrued by patients throughout their clinical interactions were used to train a custom Transformer-based neural network with a multi-head attention mechanism to predict risk of pancreatic cancer with a multi-year lead time and risk-stratify populations for targeted screening. The cohort comprised 6,017 adults with pancreatic cancer and 177,081 controls (overall median age 75, 45% female) with median 12 years (interquartile range 6.9-16.2) of medical history prior to pancreatic cancer diagnosis. External validation via leave-one-site-out, out-of-sample testing predicting pancreatic cancer 1-, 2-, and 3-years prior to diagnosis demonstrated mean area under the receiver operating characteristic of 0.837 (95% confidence interval 0.827-0.848), 0.797 (95% confidence interval 0.782-0.813), and 0.760 (95% confidence interval 0.745-0.776), respectively. Estimated pancreatic cancer risks were well-calibrated (calibration plot slope 1.08, intercept of -0.077; Brier score 0.025), and a Bayesian population pancreatic cancer prevalence update allows estimated cancer risk outputs to be transportable across settings. At testing, a screening threshold of>3.3% risk of pancreatic cancer in 1-year offered a diagnostic odds ratio of 18.2. Our work therefore lays the foundation for a first population-level digital enrichment tool to widen access to curative-intent management of pancreatic cancer.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References58

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Digitally enriching a screening population for pancreatic cancer using routine blood-based measures and clinical histories

Related Papers