NapoliNorthwesternUniversity of Campania "LuigiApr 22, 2026arXiv:2604.20791

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato

AI Summary

This study evaluates the performance of general-purpose and domain-specialized large language models (LLMs) in healthcare by examining their semantic fidelity, readability, and affective resonance in clinical contexts. The findings reveal that baseline models exhibit higher affective negativity and linguistic complexity compared to physician-authored responses, while empathy-oriented prompting and collaborative rewriting improve alignment with physician communication standards. Ultimately, the research indicates that LLMs are better suited as tools for enhancing communication rather than replacing clinical expertise, as they fail to meet epistemic criteria established by physicians.

Key Contribution

LLMs may amplify negativity and complexity in clinical communication, but collaborative rewriting can significantly enhance their alignment with physician standards.

Abstract

Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patient interactions, analyzing semantic fidelity, readability, and affective resonance. Baseline models amplify affective polarity relative to physicians (Very Negative: 43.14-45.10% vs. 37.25%) and, in larger architectures such as GPT-5 and Claude, produce substantially higher linguistic complexity (FKGL up to 16.91-17.60 vs. 11.47-12.50 in physician-authored responses). Empathy-oriented prompting reduces extreme negativity and lowers grade-level complexity (up to -6.87 FKGL points for GPT-5) but does not significantly increase semantic fidelity. Collaborative rewriting yields the strongest overall alignment. Rephrase configurations achieve the highest semantic similarity to physician answers (up to mean = 0.93) while consistently improving readability and reducing affective extremity. Dual stakeholder evaluation shows that no model surpasses physicians on epistemic criteria, whereas patients consistently prefer rewritten variants for clarity and emotional tone. These findings suggest that LLMs function most effectively as collaborative communication enhancers rather than replacements for clinical expertise.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

Related Papers