Search papers, labs, and topics across Lattice.
This paper introduces a Vision-Language-Action (VLA) model for automated ultrasound-guided needle insertion and tracking, addressing limitations of traditional modular pipelines. The model incorporates a Cross-Depth Fusion (CDF) tracking head for real-time needle tracking and a Tracking-Conditioning (TraCon) register for adapting a pre-trained vision backbone. Experiments demonstrate that the VLA model outperforms state-of-the-art trackers and manual operation in terms of tracking accuracy, insertion success rates, and procedure time.
Achieve superhuman performance in robotic ultrasound-guided needle insertion by unifying vision, language, and action in a single end-to-end model.
Ultrasound (US)-guided needle insertion is a critical yet challenging procedure due to dynamic imaging conditions and difficulties in needle visualization. Many methods have been proposed for automated needle insertion, but they often rely on hand-crafted pipelines with modular controllers, whose performance degrades in challenging cases. In this paper, a Vision-Language-Action (VLA) model is proposed for adaptive and automated US-guided needle insertion and tracking on a robotic ultrasound (RUS) system. This framework provides a unified approach to needle tracking and needle insertion control, enabling real-time, dynamically adaptive adjustment of insertion based on the obtained needle position and environment awareness. To achieve real-time and end-to-end tracking, a Cross-Depth Fusion (CDF) tracking head is proposed, integrating shallow positional and deep semantic features from the large-scale vision backbone. To adapt the pretrained vision backbone for tracking tasks, a Tracking-Conditioning (TraCon) register is introduced for parameter-efficient feature conditioning. After needle tracking, an uncertainty-aware control policy and an asynchronous VLA pipeline are presented for adaptive needle insertion control, ensuring timely decision-making for improved safety and outcomes. Extensive experiments on both needle tracking and insertion show that our method consistently outperforms state-of-the-art trackers and manual operation, achieving higher tracking accuracy, improved insertion success rates, and reduced procedure time, highlighting promising directions for RUS-based intelligent intervention.