Search papers, labs, and topics across Lattice.
The paper introduces a Multimodal Contrastive Variational AutoEncoder (MCVAE) for predicting survival outcomes in non-small cell lung cancer (NSCLC) patients using whole-slide images, bulk transcriptomics, and DNA methylation data, addressing the challenge of missing modalities in real-world clinical datasets. MCVAE employs modality-specific variational encoders, a fusion bottleneck with learned gating, and a multi-task objective combining survival, reconstruction, and cross-modal contrastive losses, along with stochastic modality masking for robustness. Experiments on TCGA-LUAD and TCGA-LUSC datasets demonstrate MCVAE's superior performance and robustness to missing data compared to existing methods, while also revealing that multimodal integration is not universally beneficial.
A novel multimodal VAE not only beats existing methods for predicting cancer survival with missing data, but also shows that naively combining modalities can hurt performance.
Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task objective that combines survival loss and reconstruction loss to regularize patient representations, along with a cross-modal contrastive loss that enforces cross-modal alignment in the latent space. During training, we apply stochastic modality masking to improve the robustness to arbitrary missingness patterns. Extensive evaluations on the TCGA-LUAD (n=475) and TCGA-LUSC (n=446) datasets demonstrate the efficacy of our approach in predicting disease-specific survival (DSS) and its robustness to severe missingness scenarios compared to two state-of-the-art models. Finally, we bring some clarifications on multimodal integration by testing our model on all subsets of modalities, finding that integration is not always beneficial to the task.