Search papers, labs, and topics across Lattice.
This paper benchmarks pathology foundation models (PFMs) for breast cancer survival prediction using whole-slide histopathology images across three independent clinical cohorts. They evaluate model representations using a standardized patch-level feature extraction pipeline and a unified survival modeling framework, training on one cohort and evaluating on two external cohorts. H-optimus-1 achieves the strongest survival prediction performance, and second-generation PFMs generally outperform first-generation models, though performance gains are diminishing.
Scaling up pathology foundation models doesn't guarantee better survival prediction鈥攁 distilled model with 8% of the parameters can outperform its larger teacher.
Pathology foundation models (PFMs) have recently emerged as powerful pretrained encoders for computational pathology, enabling transfer learning across a wide range of downstream tasks. However, systematic comparisons of these models for clinically meaningful prediction problems remain limited, especially in the context of survival prediction under external validation. In this study, we benchmark widely used and recently proposed PFMs for breast cancer survival prediction from whole-slide histopathology images. Using a standardized pipeline based on patch-level feature extraction and a unified survival modeling framework, we evaluate model representations across three independent clinical cohorts comprising more than 5,400 patients with long-term follow-up. Models are trained on one cohort and evaluated on two independent external cohorts, enabling a rigorous assessment of cross-dataset generalization. Overall, H-optimus-1 achieves the strongest survival prediction performance. More broadly, we observe consistent generational improvements across model families, with second-generation PFMs outperforming their first-generation counterparts. However, absolute performance differences between many recent PFMs remain modest, suggesting diminishing returns from further scaling of pretraining data or model size alone. Notably, the compact distilled model H0-mini slightly outperforms its larger teacher model H-optimus-0, despite using fewer than 8% of the parameters and enabling significantly faster feature extraction. Together, these results provide the first large-scale, externally validated benchmark of PFMs for breast cancer survival prediction, and offer practical guidance for efficient deployment of PFMs in clinical workflows.