Search papers, labs, and topics across Lattice.
This paper compares the performance of several open-source large language models (LLMs) against cTAKES, a traditional NLP system, for extracting tobacco smoking status from hospital discharge summaries. They annotated 250 discharge summaries and benchmarked the LLMs (Llama-3, gpt-oss-20B, MedGemma-27B) and cTAKES using weighted F1-score, macro F1-score, and per-class F1-scores. The gpt-oss-20B model achieved non-inferior performance compared to cTAKES, suggesting its viability as an open-source alternative for clinical information extraction.
Open-source LLMs can match the performance of established clinical NLP systems like cTAKES for information extraction, opening the door to more accessible and adaptable clinical text processing.
Abstract Objectives To compare lightweight open-source large language models (LLMs) with cTAKES, a state-of-the-art natural language processing (NLP) system, in an information extraction task from hospitalization discharge summaries. Materials and Methods Two readers annotated 250 randomly sampled adult discharge summaries (BJC HealthCare, 2018-2023) for tobacco smoking status as “Smoker,” “Never smoker,” “Unknown.” Six LLMs (Llama-3 [1B-70B], gpt-oss-20B, MedGemma-27B) and cTAKES extracted smoking status from summaries. Performance was benchmarked against consensus annotations using weighted F1-score, macro F1-score, and per-class F1-scores and a noninferiority test. Results Inter-reader agreement was excellent (κ = 0.91). LLM size (2.3-47.3 GB) and inference time (2.5-14.5 s/note) varied. gpt-oss-20B achieved non-inferior performance vs cTAKES (F1 = 0.99 vs 0.97; P < .021). Discussion The high accuracy and efficiency of gpt-oss-20B support its potential as a practical, open-source alternative to traditional NLP for clinical information extraction. Conclusion Lightweight LLMs can be applied for use across diverse clinical information extraction tasks without the need for task-specific fine-tuning.