Feb 19, 2026arXiv:2602.17402

A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

Michele Zanitti, Vanja Miskovic, Francesco Trovò, Alessandra Laura Giulia Pedrocchi, Ming Shen, Yan Kyaw Tun, Arsela Prelaj, Sokol Kosta

AI Summary

The paper introduces a Multimodal Contrastive Variational AutoEncoder (MCVAE) for predicting survival outcomes in non-small cell lung cancer (NSCLC) patients using whole-slide images, bulk transcriptomics, and DNA methylation data, addressing the challenge of missing modalities in real-world clinical datasets. MCVAE employs modality-specific variational encoders, a fusion bottleneck with learned gating, and a multi-task objective combining survival, reconstruction, and cross-modal contrastive losses, along with stochastic modality masking for robustness. Experiments on TCGA-LUAD and TCGA-LUSC datasets demonstrate MCVAE's superior performance and robustness to missing data compared to existing methods, while also revealing that multimodal integration is not universally beneficial.

Key Contribution

A novel multimodal VAE not only beats existing methods for predicting cancer survival with missing data, but also shows that naively combining modalities can hurt performance.

Abstract

Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task objective that combines survival loss and reconstruction loss to regularize patient representations, along with a cross-modal contrastive loss that enforces cross-modal alignment in the latent space. During training, we apply stochastic modality masking to improve the robustness to arbitrary missingness patterns. Extensive evaluations on the TCGA-LUAD (n=475) and TCGA-LUSC (n=446) datasets demonstrate the efficacy of our approach in predicting disease-specific survival (DSS) and its robustness to severe missingness scenarios compared to two state-of-the-art models. Finally, we bring some clarifications on multimodal integration by testing our model on all subsets of modalities, finding that integration is not always beneficial to the task.

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

Related Papers