Search papers, labs, and topics across Lattice.
This study introduces a clinically grounded framework for evaluating the privacy risks of medical language models (LMs) by assessing leakage across various tiers of adversarial access. The framework measures both verbatim memorization of patient-specific text and semantic leakage of sensitive diagnoses, revealing that routine encounter metadata leads to significant memorization and recovery rates for sensitive information. Findings indicate that while exact-match memorization rates are high, they may overstate actual disclosure risks due to templated documentation, underscoring the need for careful privacy evaluations in models trained on clinical data.
Routine encounter metadata can trigger high rates of verbatim memorization and sensitive diagnosis recovery in medical LMs, raising serious privacy concerns.
Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded framework that evaluates leakage along a graded axis of adversarial access, ranging from publicly inferable demographics to leaked note fragments. At each tier, we measure verbatim memorization of patient-specific text and semantic leakage of sensitive diagnoses. Applying the framework to an LM pretrained on 378k clinical notes, we find that routine encounter metadata (i.e. name, date of birth, provider, practice, visit date) elicits high rates of verbatim memorization across a patient's timeline and sensitive-diagnosis recovery (AUROC 0.91 for abortion, 0.81 for HIV). At the same time, exact-match memorization can overstate disclosure: 36% of memorized tokens reflect templated documentation. Our work highlights the risks of training on longitudinal clinical data, providing a practical framework for contextual privacy evaluation of medical LMs.