Mar 4, 2026arXiv:2603.03957

ArthroCut: Autonomous Policy Learning for Robotic Bone Resection in Knee Arthroplasty

Xu Lu, Yiling Zhang, Wenquan Cheng, Long Ma, Longfei Ma, Fang Chen, Hongen Liao

AI Summary

ArthroCut, a novel autonomous policy learning framework, was developed to enhance knee arthroplasty robots by enabling context-aware action generation. The framework fine-tunes a Qwen-VL backbone on a multimodal dataset of 21 knee arthroplasty cases, integrating preoperative imaging, intraoperative tracking, surgical video, and robot state. Bench-top experiments demonstrated an 86% average success rate across six standard resections, significantly outperforming baselines, with ablation studies highlighting the importance of both time-aligned surgical tokens and preoperative imaging tokens.

Key Contribution

Achieve near-human success rates in autonomous robotic knee arthroplasty by fusing preoperative imaging with real-time surgical data to guide tokenized action generation.

Abstract

Despite rapid commercialization of surgical robots, their autonomy and real-time decision-making remain limited in practice. To address this gap, we propose ArthroCut, an autonomous policy learning framework that upgrades knee arthroplasty robots from assistive execution to context-aware action generation. ArthroCut fine-tunes a Qwen--VL backbone on a self-built, time-synchronized multimodal dataset from 21 complete cases (23,205 RGB--D pairs), integrating preoperative CT/MR, intraoperative NDI tracking of bones and end effector, RGB--D surgical video, robot state, and textual intent. The method operates on two complementary token families -- Preoperative Imaging Tokens (PIT) to encode patient-specific anatomy and planned resection planes, and Time-Aligned Surgical Tokens (TAST) to fuse real-time visual, geometric, and kinematic evidence -- and emits an interpretable action grammar under grammar/safety-constrained decoding. In bench-top experiments on a knee prosthesis across seven trials, ArthroCut achieves an average success rate of 86% over the six standard resections, significantly outperforming strong baselines trained under the same protocol. Ablations show that TAST is the principal driver of reliability while PIT provides essential anatomical grounding, and their combination yields the most stable multi-plane execution. These results indicate that aligning preoperative geometry with time-aligned intraoperative perception and translating that alignment into tokenized, constrained actions is an effective path toward robust, interpretable autonomy in orthopedic robotic surgery.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ArthroCut: Autonomous Policy Learning for Robotic Bone Resection in Knee Arthroplasty

Related Papers