UWHealthResearch/MedMisBenchOxfordUCLWaterlooJun 10, 2026arXiv:2606.12291

MedMisBench: Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Bradley Max Segal, Dhruv Darji, Joshua Fieggen, Kapil Narain, Mingde Zeng, Lei Clifton

AI Summary

This study introduces MedMisBench, a benchmark designed to evaluate the epistemic resilience of large language models (LLMs) in medical contexts by assessing their ability to maintain correct judgments when faced with misleading information. The findings reveal a significant drop in accuracy from 71.1% on original medical questions to just 38.0% when misleading contexts are introduced, highlighting a concerning vulnerability in LLMs' decision-making processes. Notably, authority-framed falsehoods and exception-poisoning claims were particularly effective at undermining model performance, with a clinical panel identifying serious potential harm in over a third of reviewed cases.

Key Contribution

LLMs can lose over 30% of their accuracy in medical judgment when exposed to misleading contexts, revealing a critical vulnerability in their deployment for health advice.

Abstract

Large language models (LLMs) now reach expert-level scores on medical licensing exams, encouraging the assumption that high scores imply safe medical judgment while patients increasingly use them for health advice. We show this assumption is fragile: when misleading context is injected into questions that LLMs originally answer correctly, they abandon the correct answer. We call the ability to maintain correct judgment under adversarial context epistemic resilience, and introduce MedMisBench to measure it. MedMisBench contains 10,932 medical question items and 48,889 misleading context-option pairs spanning medical reasoning, agentic capability, and patient-journey evaluation. Across 11 model configurations, mean accuracy falls from 71.1% on original questions to 38.0% under focused misleading context, with 51.5% attack success. The most damaging injections are formal, rule-like fabrications: authority-framed falsehoods reach 69.5% attack success and exception-poisoning claims reach 64.1%. A 14-member clinical panel from 7 countries identified serious potential harm in 38.2% of reviewed cases. MedMisBench exposes a structural blind spot in LLM evaluation in medical settings: existing benchmarks measure what models know, but not whether they preserve correct medical judgment under misleading context.1

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenuebioRxiv

Related Papers

Finding related papers...

Search

MedMisBench: Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Related Papers