Search papers, labs, and topics across Lattice.
The authors introduce MedLayBench-V, a new large-scale multimodal benchmark designed to evaluate and improve the ability of medical vision-language models (Med-VLMs) to communicate medical image findings in a way that is understandable to laypersons. The benchmark is constructed using a Structured Concept-Grounded Refinement (SCGR) pipeline that leverages UMLS CUIs and micro-level entity constraints to ensure semantic equivalence between expert and lay descriptions. This dataset addresses the current lack of resources for training and evaluating Med-VLMs on lay-accessible medical image understanding.
Current medical vision-language models can't explain medical images to patients, but MedLayBench-V offers a way to fix that.
Medical Vision-Language Models (Med-VLMs) have achieved expert-level proficiency in interpreting diagnostic imaging. However, current models are predominantly trained on professional literature, limiting their ability to communicate findings in the lay register required for patient-centered care. While text-centric research has actively developed resources for simplifying medical jargon, there is a critical absence of large-scale multimodal benchmarks designed to facilitate lay-accessible medical image understanding. To bridge this resource gap, we introduce MedLayBench-V, the first large-scale multimodal benchmark dedicated to expert-lay semantic alignment. Unlike naive simplification approaches that risk hallucination, our dataset is constructed via a Structured Concept-Grounded Refinement (SCGR) pipeline. This method enforces strict semantic equivalence by integrating Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) with micro-level entity constraints. MedLayBench-V provides a verified foundation for training and evaluating next-generation Med-VLMs capable of bridging the communication divide between clinical experts and patients.