Search papers, labs, and topics across Lattice.
The paper introduces MIRA, a bilingual benchmark to evaluate if LLMs provide consistent medical information across different user phrasings of the same question, varying language, register, and health literacy. The study reveals a pattern of "Differential Information Dilution" (DID) where responses to low health-literacy prompts omit key information and offer less support. A knowledge-guided mitigation prompt is shown to reduce information dilution, particularly for Claude and Qwen.
LLMs answering medical questions consistently dilute information when responding to prompts indicating low health literacy, even while answering all questions posed.
Large language models (LLMs) are increasingly used to provide public-facing health information, yet existing safety evaluations overlook whether responses preserve comparable medical information across different user phrasings of the same question. To address this, we introduce the Medical Information Response Audit (MIRA), a bilingual, controlled benchmark that assesses whether LLMs provide comparable medical information across user-side language, register, and health literacy signals. MIRA contains 4,320 prompts built from 60 medically reviewed, low-risk health questions. Across five mainstream LLMs, models answered all medical questions, but responses to low health-literacy signals consistently omitted more key information, provided fewer concrete next steps, and offered less support for independent judgment. We term this pattern Differential Information Dilution (DID). Language effects are model-specific rather than uniformly worse for non-English prompts. A comparison with 300 real-world health queries provides preliminary evidence of rank-order validity. A knowledge-guided mitigation prompt reduces information dilution for most models, with the largest reductions in underinformative simplification observed for Claude (~8%) and Qwen (~6%).