Google ResearchChongqing Ant Consumer Finance Co.CornellKeiji AINIHNTUNYUPittUIUCUT AustinUT Southwestern Medical CenterJan 27, 2025arXiv:2501.16255

A foundation model for human-AI collaboration in medical literature mining

Zifeng Wang, Lang Cao, Qiao Jin, Joey Chan, Nicholas Wan, Behdad Afzali, Hyun-Jin Cho, Chang-In Choi, M. Emamverdi, M. K. Gill, Sun-Hyung Kim, Yijia Li, Yi Liu, Hanley Ong, Justin F. Rousseau, Irfan Sheikh, Jenny J. Wei, Ziyang Xu, Christopher M. Zallek, Kyungsang Kim, Yifan Peng, Zhiyong Lu, Jimeng Sun

AI Summary

The authors introduce LEADS, a foundation model for medical literature mining, trained on a large-scale instruction dataset (LEADSInstruct) derived from systematic reviews, clinical trial publications, and registries. LEADS outperforms generic LLMs on tasks like study search, screening, and data extraction, demonstrating the value of domain-specific pretraining. A user study showed that clinicians and researchers using LEADS experienced improved recall in study selection and accuracy in data extraction, along with significant time savings.

Key Contribution

Clinicians using a medical literature-specific foundation model, LEADS, achieved 23-27% time savings and improved accuracy/recall compared to working alone.

Abstract

Systematic literature review is essential for evidence-based medicine, requiring comprehensive analysis of clinical trial publications. However, the application of artificial intelligence (AI) models for medical literature mining has been limited by insufficient training and evaluation across broad therapeutic areas and diverse tasks. Here, we present LEADS, an AI foundation model for study search, screening, and data extraction from medical literature. The model is trained on 633,759 instruction data points in LEADSInstruct, curated from 21,335 systematic reviews, 453,625 clinical trial publications, and 27,015 clinical trial registries. We showed that LEADS demonstrates consistent improvements over four cutting-edge generic large language models (LLMs) on six tasks. Furthermore, LEADS enhances expert workflows by providing supportive references following expert requests, streamlining processes while maintaining high-quality results. A study with 16 clinicians and medical researchers from 14 different institutions revealed that experts collaborating with LEADS achieved a recall of 0.81 compared to 0.77 experts working alone in study selection, with a time savings of 22.6%. In data extraction tasks, experts using LEADS achieved an accuracy of 0.85 versus 0.80 without using LEADS, alongside a 26.9% time savings. These findings highlight the potential of specialized medical literature foundation models to outperform generic models, delivering significant quality and efficiency benefits when integrated into expert workflows for medical literature mining.

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Citation Metrics

Citations11

Influential citations1

References58

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

A foundation model for human-AI collaboration in medical literature mining

Related Papers