Search papers, labs, and topics across Lattice.
This systematic review and meta-analysis evaluated the application of AI in hand surgery, including diagnostic imaging, outcome prediction, and documentation support. The analysis of 98 studies showed that AI systems have high sensitivity and specificity for distal radius fracture detection, and higher sensitivity but lower specificity for scaphoid fractures compared to human readers. AI also demonstrated promise in outcome prediction and documentation, but requires prospective multicenter validation before routine clinical adoption.
AI demonstrates comparable performance to human readers in distal radius fracture detection and higher sensitivity in scaphoid fracture detection, suggesting a potential role in augmenting radiographic interpretation.
Background Artificial intelligence (AI) is increasingly applied in medicine, yet its clinical integration in hand surgery remains variable and incompletely validated. This systematic review and meta-analysis evaluated current AI applications in hand surgery and benchmarked performance against human comparators where available. Methods Following PRISMA 2020 guidelines, PubMed/MEDLINE, Embase, Web of Science, and the Cochrane Library were searched through October 2025. Eligible studies evaluated AI systems in hand or wrist surgery with reported performance metrics. Outcomes included diagnostic accuracy, prognostic discrimination, concordance with clinical recommendations, workflow impact, and user satisfaction. Meta-analysis using a bivariate random-effects model was performed when ≥3 comparable studies were available and was restricted to radiograph-based fracture detection (distal radius and scaphoid). All other applications were synthesized narratively due to heterogeneity. The protocol was registered with PROSPERO (CRD420251230505). Results Of 1228 screened records, 98 studies met inclusion criteria, most addressing diagnostic imaging. For distal radius fractures, pooled AI sensitivity and specificity were 92 % and 89 %, compared with 95 % and 94 % for human readers. For scaphoid fractures, AI demonstrated higher sensitivity (85 % vs. 71 %) but lower specificity (83 % vs. 93 %). Prognostic machine-learning models outperformed clinician estimates in selected retrospective cohorts (mean accuracy 78 % vs. 65 %), although calibration and external validation were inconsistently reported. Large language models demonstrated feasibility in simulated settings, achieving passing specialty-exam scores and generating high-quality documentation (mean satisfaction 4.6/5), while showing high sensitivity but variable specificity in treatment recommendations. Robotic and instrument-tracking applications remain experimental. Conclusions AI demonstrates promise in selected hand-surgery tasks, particularly fracture detection, outcome prediction, and documentation support. However, evidence is predominantly retrospective and single-center. Prospective multicenter validation and careful attention to bias, transparency, and ethical safeguards are required before routine clinical adoption. AI should augment-not replace-clinical expertise. Level of evidence II (systematic review/meta-analysis of predominantly Level II-III studies).