Search papers, labs, and topics across Lattice.
This paper introduces a multimodal machine learning framework for predicting 5-year overall survival in breast cancer patients using clinical, transcriptomic, and copy-number alteration data from the METABRIC cohort. The study compared elastic-net regularized Cox models (CoxNet) and gradient-boosted survival trees (XGBoost) after feature filtering and dimensionality reduction, focusing on calibration and fairness. Results showed that both CoxNet and XGBoost achieved high AUC and AP scores, with fairness diagnostics indicating stable discrimination across various patient subgroups.
Despite the complexity of multimodal cancer data, surprisingly accurate (AUC > 0.92) and fair 5-year survival predictions are possible using relatively standard ML techniques like CoxNet and XGBoost.
Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using XGBoost. CoxNet provides embedded feature selection and stable estimation, whereas XGBoost captures nonlinear effects and higher-order interactions. Performance was assessed using time-dependent area under the ROC curve (AUC), average precision (AP), calibration curves, Brier score, and bootstrapped 95 percent confidence intervals. CoxNet achieved validation and test AUCs of 98.3 and 96.6, with AP values of 90.1 and 80.4. XGBoost achieved validation and test AUCs of 98.6 and 92.5, with AP values of 92.5 and 79.9. Fairness diagnostics showed stable discrimination across age groups, estrogen receptor status, molecular subtypes, and menopausal state. This work introduces a governance-oriented multimodal survival framework emphasizing calibration, fairness auditing, robustness, and reproducibility for high-dimensional clinical machine learning.