CIBERSAMDepartment of PsychiatryDepartment of PsychologyMadrid Autonomous UniversityUniversity Hospital Infanta ElenaUniversity Hospital Jimenez Diaz FoundationUniversity Hospital Rey Juan CarlosApr 29, 2026arXiv:2604.27014

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Guillermo Iglesias, Gema Bello-Orgaz, María Navas-Loro, Cristian Ramirez-Atencia, Mercè Salvador Robert, Enrique Baca-Garcia

AI Summary

This paper explores the use of LLMs (DeepSeek-R1, OpenBioLLM-Llama3 and Qwen 3.5) for generating synthetic mental health evaluation reports conditioned on ICD-10 codes to address the scarcity of annotated medical data. It introduces a three-dimensional evaluation framework assessing semantic fidelity, lexical diversity, and privacy/plagiarism of the generated text. Results indicate that the tested LLMs can generate clinically relevant, diverse, and privacy-safe synthetic reports suitable for augmenting training data in clinical NLP.

Key Contribution

LLMs can generate synthetic mental health records that are clinically coherent, lexically diverse, and privacy-safe, offering a promising solution to data scarcity in mental health research.

Abstract

The scarcity of high-quality annotated medical data, particularly in mental health, poses a significant bottleneck for training robust machine learning models. Privacy regulations restrict data sharing, making synthetic data generation a promising alternative. The use of Large Language Models (LLMs) in a data augmentation pipeline could be leveraged as an alternative in this field. In the proposed methodology, DeepSeek-R1, OpenBioLLM-Llama3 and Qwen 3.5 are used to generate synthetic mental health evaluation reports conditioned on specific International Classification of Diseases, Tenth Revision (ICD-10) codes. Because naive text generation can lead to mode collapse or privacy breaches (memorization), a comprehensive evaluation framework is introduced. The generated diagnostic texts are assessed across three dimensions: semantic fidelity, lexical diversity, and privacy/plagiarism. The results demonstrate that all models can generate clinically coherent, diverse, and privacy-safe synthetic reports, significantly expanding the available training data for clinical natural language processing tasks without compromising patient confidentiality.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Related Papers