McMaster UniversityONUofTJan 28, 2026

A BLINDED COMPARISON OF THREE GENERATIVE ARTIFICIAL INTELLIGENCE CHATBOTS FOR ORTHOPAEDIC SURGERY THERAPEUTIC QUESTIONS

V. Arora, J. Silburt, M. Phillips, M. Khan, B. Petrisor, H. Chaudhry

AI Summary

This study compared the quality of responses from three AI chatbots (ChatGPT, Bing Chat, and AskOE) to orthopaedic surgery therapeutic treatment questions. Orthopaedic surgery experts blindly reviewed chatbot responses using a standardized rubric assessing clinical correctness, completeness, safety, usefulness, and references. AskOE was preferred and received significantly higher evaluation scores than both ChatGPT and Bing Chat.

Key Contribution

AskOE, an orthopaedic-specific chatbot, provides more clinically correct, complete, safe, and useful answers to orthopaedic therapeutic questions compared to general AI chatbots like ChatGPT and Bing Chat.

Abstract

To compare the quality of responses from three chatbots (ChatGPT, Bing Chat, and AskOE) across various orthopaedic surgery therapeutic treatment questions. We identified a series of treatment-related questions across a range of subspecialties in orthopaedic surgery. Questions were “identically” entered into one of three chatbots (ChatGPT, Bing Chat, and AskOE) and reviewed using a standardized rubric. Orthopaedic surgery experts associated with McMaster University and the University of Toronto blindly reviewed all responses. The primary outcomes were scores on a five-item assessment tool assessing clinical correctness, clinical completeness, safety, usefulness, and references. The secondary outcome was the reviewers' preferred response for each question. We performed a mixed effects logistic regression to identify factors associated with selecting a preferred chatbot. Across all questions and answers, AskOE was preferred by reviewers to a significantly greater extent than both ChatGPT (P < 0 .001) and Bing (P < 0 .001). AskOE also received significantly higher total evaluation scores than both ChatGPT (P < 0 .001) and Bing (P < 0 .001). Further regression analysis showed that clinical correctness, clinical completeness, usefulness, and references were significantly associated with a preference for AskOE. Across all responses, there were four considered as having major errors in response, with three occurring with ChatGPT and one occurring with AskOE. Reviewers significantly preferred AskOE over ChatGPT and Bing Chat across a variety of variables in orthopaedic therapy questions. This technology has important implications in a healthcare setting as it provides access to trustworthy answers in orthopaedic surgery.

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueOrthopaedic Proceedings

Related Papers

Finding related papers...

Search

A BLINDED COMPARISON OF THREE GENERATIVE ARTIFICIAL INTELLIGENCE CHATBOTS FOR ORTHOPAEDIC SURGERY THERAPEUTIC QUESTIONS

Related Papers