Independent ResearcherJan 27, 2025arXiv:2501.18632

Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare

AI Summary

This paper investigates the vulnerability of seven LLMs to jailbreaking attacks within medical contexts using three black-box techniques. It introduces an automated, domain-adapted agentic evaluation pipeline to quantify the success of these attacks. The study finds that both commercial and open-source LLMs are susceptible to medical jailbreaking and explores the effectiveness of Continual Fine-Tuning (CFT) as a defense mechanism.

Key Contribution

Despite their promise, today's best LLMs are alarmingly easy to jailbreak in medical contexts, raising serious concerns about their safe deployment as AI clinicians.

Abstract

Large language models (LLMs) are increasingly utilized in healthcare applications. However, their deployment in clinical practice raises significant safety concerns, including the potential spread of harmful information. This study systematically assesses the vulnerabilities of seven LLMs to three advanced black-box jailbreaking techniques within medical contexts. To quantify the effectiveness of these techniques, we propose an automated and domain-adapted agentic evaluation pipeline. Experiment results indicate that leading commercial and open-source LLMs are highly vulnerable to medical jailbreaking attacks. To bolster model safety and reliability, we further investigate the effectiveness of Continual Fine-Tuning (CFT) in defending against medical adversarial attacks. Our findings underscore the necessity for evolving attack methods evaluation, domain-specific safety alignment, and LLM safety-utility balancing. This research offers actionable insights for advancing the safety and reliability of AI clinicians, contributing to ethical and effective AI deployment in healthcare.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations8

Influential citations0

References30

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare

Related Papers