Search papers, labs, and topics across Lattice.
The authors created a reproducible framework for extracting seizure frequency from clinical letters using synthetic data generation. They trained a teacher language model to generate NHS-style synthetic letters paired with structured labels, rationales, and evidence spans for seizure frequency. Fine-tuning open-weight language models (4B-14B) on this synthetic data achieved strong generalization to real clinical letters, with structured label prediction outperforming direct numeric regression, reaching micro-F1 scores up to 0.858 on pragmatic categories.
Forget painstakingly annotating sensitive patient data: this work shows you can train models to extract complex clinical information with surprisingly high accuracy using only synthetic data.
Seizure-frequency information is important for epilepsy research and clinical care, but it is usually recorded in variable free-text clinic letters that are hard to annotate and share. We developed a reproducible, privacy-preserving framework for extracting seizure frequency using fully synthetic yet task-faithful epilepsy letters. We defined a structured label scheme covering common descriptions of seizure burden, including explicit rates, ranges, clusters, seizure-free intervals, unknown frequency, and explicit no-seizure statements. A teacher language model generated NHS-style synthetic letters paired with normalized labels, rationales, and evidence spans. We fine-tuned several open-weight language models (4B-14B parameters) on these synthetic letters to extract seizure frequency from full documents, comparing direct numeric prediction with structured label prediction and testing evidence-grounded outputs. On a clinician-checked held-out set of real clinic letters, models trained only on synthetic data generalized well, and structured labels consistently outperformed direct numeric regression. With 15,000 synthetic training letters, models achieved micro-F1 scores up to 0.788 for fine-grained categories and 0.847 for pragmatic categories; a medically oriented 4B model achieved 0.787 and 0.858, respectively. Evidence-grounded outputs also supported rapid clinical verification and error analysis. These results show that synthetic, structured, evidence-grounded supervision can enable robust seizure-frequency extraction without sharing sensitive patient text and may generalize to other temporally complex clinical information extraction tasks.