Search papers, labs, and topics across Lattice.
This study evaluates the performance of general-purpose and domain-specialized large language models (LLMs) in healthcare by examining their semantic fidelity, readability, and affective resonance in clinical contexts. The findings reveal that baseline models exhibit higher affective negativity and linguistic complexity compared to physician-authored responses, while empathy-oriented prompting and collaborative rewriting improve alignment with physician communication standards. Ultimately, the research indicates that LLMs are better suited as tools for enhancing communication rather than replacing clinical expertise, as they fail to meet epistemic criteria established by physicians.
LLMs may amplify negativity and complexity in clinical communication, but collaborative rewriting can significantly enhance their alignment with physician standards.
Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patient interactions, analyzing semantic fidelity, readability, and affective resonance. Baseline models amplify affective polarity relative to physicians (Very Negative: 43.14-45.10% vs. 37.25%) and, in larger architectures such as GPT-5 and Claude, produce substantially higher linguistic complexity (FKGL up to 16.91-17.60 vs. 11.47-12.50 in physician-authored responses). Empathy-oriented prompting reduces extreme negativity and lowers grade-level complexity (up to -6.87 FKGL points for GPT-5) but does not significantly increase semantic fidelity. Collaborative rewriting yields the strongest overall alignment. Rephrase configurations achieve the highest semantic similarity to physician answers (up to mean = 0.93) while consistently improving readability and reducing affective extremity. Dual stakeholder evaluation shows that no model surpasses physicians on epistemic criteria, whereas patients consistently prefer rewritten variants for clarity and emotional tone. These findings suggest that LLMs function most effectively as collaborative communication enhancers rather than replacements for clinical expertise.