BrownFeb 26, 2026

A Blinded Analysis of Quality and Fidelity in Orthopaedic Patient Education Materials Simplified by ChatGPT and Humans.

Joseph E. Nassar, Maxwell Sahhar, Lama A. Ammar, Joseph Carroll, Anne-Emilie Rouffiac, Marco Kaper, Alex Hernandez-Manriquez, Manjot Singh, Edward Akelman, Alan H. Daniels

AI Summary

This study evaluated the ability of ChatGPT-4o and ChatGPT-5 to simplify orthopaedic patient education materials (PEMs) in English and Spanish, comparing them to human-simplified versions. The study analyzed 806 PEM documents, assessing readability and fidelity (hallucinations, omissions, inconsistencies) by blinded clinicians. ChatGPT-simplified PEMs achieved lower reading grade levels, with ChatGPT-5 showing improved Spanish fidelity compared to ChatGPT-4o, reaching human-comparable hallucination rates.

Key Contribution

AI-driven simplification using ChatGPT-5 can generate more readable orthopaedic patient education materials in both English and Spanish while maintaining acceptable fidelity, potentially improving patient comprehension and adherence.

Abstract

BACKGROUND Orthopaedic patient education materials (PEMs) within Epic's Elsevier library often exceed the recommended sixth-grade reading level, with a mean grade of 8.6 in English and 5.8 in Spanish, risking poor patient comprehension and adherence. The present study evaluated whether artificial intelligence (AI)-based text simplification can improve readability while preserving clinical accuracy. The objectives were to use previously established readability data for English and Spanish PEMs as baselines, to assess the impact of human-based and ChatGPT-based simplification on reading grade level, and to compare the fidelity of simplified texts against standard materials. METHODS In March 2025, 806 orthopaedic PEM documents were simplified using standardized ChatGPT prompts. Readability was reassessed using validated English and Spanish formulas, and fidelity was evaluated in the 86 PEMs that also had human easy-to-read versions. Two blinded clinicians compared human and ChatGPT-4o outputs with the originals to identify hallucinations, omissions, and inconsistencies according to severity. Following the release of ChatGPT-5, an unblinded post hoc analysis was performed using identical criteria. RESULTS ChatGPT-4o-simplified PEMs showed mean reading grade levels of 6.1 in English and 3.5 in Spanish. Compared with human simplifications, ChatGPT-4o showed fewer English omissions, similar Spanish omissions, fewer inconsistencies in both languages, and comparable English hallucinations, but higher Spanish hallucinations. Compared with ChatGPT-4o, ChatGPT-5 preserved English performance and improved Spanish fidelity, reducing hallucinations to human-comparable rates. CONCLUSIONS AI-driven simplification can produce orthopaedic PEMs that are easier to read while maintaining acceptable fidelity. The improvements observed with ChatGPT-5 highlight its potential for clinician-supervised use in generating accessible and reliable PEMs. CLINICAL RELEVANCE This study is clinically relevant because orthopaedic PEMs are routinely delivered through the Epic electronic health record and directly affect patient understanding, consent, and adherence in both English and Spanish. By evaluating the readability and fidelity of AI-simplified materials across languages, this study informs safe, scalable strategies to improve patient communication in everyday orthopaedic practice.

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueJournal of Bone and Joint Surgery. American volume

Related Papers

Finding related papers...

Search

A Blinded Analysis of Quality and Fidelity in Orthopaedic Patient Education Materials Simplified by ChatGPT and Humans.

Related Papers