Search papers, labs, and topics across Lattice.
This paper introduces a new dataset of Italian Emergency Department clinical notes annotated for automatic Case Report Form (CRF) filling, comprising 134 items. The authors define the CRF-filling task and evaluation metric, and conduct pilot experiments using an open-source LLM in a zero-shot setting. Results indicate the feasibility of zero-shot CRF-filling in Italian but highlight biases in LLM outputs, such as a tendency to select "unknown" answers.
LLMs can fill out medical forms from Italian clinical notes in a zero-shot setting, but watch out for those "unknown" biases.
Case Report Forms (CRFs) collect data about patients and are at the core of well-established practices to conduct research in clinical settings. With the recent progress of language technologies, there is an increasing interest in automatic CRF-filling from clinical notes, mostly based on the use of Large Language Models (LLMs). However, there is a general scarcity of annotated CRF data, both for training and testing LLMs, which limits the progress on this task. As a step in the direction of providing such data, we present a new dataset of clinical notes from an Italian Emergency Department annotated with respect to a pre-defined CRF containing 134 items to be filled. We provide an analysis of the data, define the CRF-filling task and metric for its evaluation, and report on pilot experiments where we use an open-source state-of-the-art LLM to automatically execute the task. Results of the case-study show that (i) CRF-filling from real clinical notes in Italian can be approached in a zero-shot setting; (ii) LLMs'results are affected by biases (e.g., a cautious behaviour favours"unknown"answers), which need to be corrected.