Search papers, labs, and topics across Lattice.
This systematic review and meta-analysis evaluated the effectiveness of AI models, including ChatGPT and GoogleBard, in answering patient inquiries related to shoulder and elbow orthopaedic pathologies. Analyzing 16 studies, the review found that AI models achieved a pooled accuracy of 78% and a pooled mean AUC of 86%, but performed inferiorly to experts and produced responses with poor readability and quality. The authors conclude that AI can augment, but not replace, expert-provided patient education.
AI models currently provide inferior patient education for shoulder and elbow pathologies compared to human experts, limiting their standalone use in this setting.
Background AI and machine learning (ML) have diverse applications in orthopedic surgery, such as for diagnosis of disease, surgical assistance, and outcome prediction. When used as adjuncts, AI has a potential to reduce clinical workload, improve workflow and aid in clinical decision making. The objective of this systematic review is to evaluate current literature on artificial intelligence (AI) to assess effectiveness in developing responses to inquiries related to orthopaedic upper extremity pathologies. Methods Three databases (PubMed, MEDLINE, EMBASE) were searched for studies involving AI and questions in shoulder and elbow orthopedics. Inclusion criteria included papers related to shoulder and elbow, human studies, use of AI models, published in English language and at any level of evidence. Data on response accuracy, reliability and quality, as well as area under the curve (AUC) of the given AI algorithm, were recorded. Meta-analyses were conducted on both the accuracy and AUC of AI algorithms on relevant studies. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). Results A total of 16 studies were included in this review. Nine studies used a version of ChatGPT, one study used GoogleBard, and the remaining seven studies used a variety of AI learning models. The overall pooled accuracy of responses developed by AI models was 78%, and the pooled mean AUC of included AI algorithms was 86%. AI algorithms performed inferiorly compared to experts. The overall quality and readability of AI responses were poor. Conclusions AI algorithms assessed in our study demonstrated a promising degree of accuracy and performance. However, AI responses were found to be inferior to experts and had poor readability, quality, and value to the patient. In its current state of technology, AI is a powerful tool that can be used in conjunction with experts to augment patient education, however, it should not be utilized independently.