MIT CSAILDana-Farber Cancer InstituteHarvardApr 20, 2026arXiv:2604.18570

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Andrew Zhang, Tong Ding, Sophia J. Wagner, Caiwei Tian, Ming Y. Lu, Rowland Pettit, Joshua E. Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, L. Le, Faisal Mahmood, F. Mahmood

AI Summary

The authors introduce Apollo, a multimodal temporal foundation model trained on 25 billion records from 7.2 million patients across 28 medical modalities. Apollo learns a unified representation space integrating over 100k medical events, images, and clinical text to create virtual patient representations. Evaluated on 322 prognosis and retrieval tasks, Apollo demonstrates strong performance in predicting disease onset, progression, treatment response, adverse events, and hospital operations endpoints up to five years in advance.

Key Contribution

Imagine a medical "Google" where you can search for similar patients using text, images, and medical history, and predict future health risks years in advance – Apollo brings this closer to reality.

Abstract

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This"atlas of medical concepts"forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Related Papers