Apr 7, 2026arXiv:2604.05738

MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

Han Jang, Junhyeok Lee, Heeseong Eum, Kyu Sung Choi

AI Summary

The authors introduce MedLayBench-V, a new large-scale multimodal benchmark designed to evaluate and improve the ability of medical vision-language models (Med-VLMs) to communicate medical image findings in a way that is understandable to laypersons. The benchmark is constructed using a Structured Concept-Grounded Refinement (SCGR) pipeline that leverages UMLS CUIs and micro-level entity constraints to ensure semantic equivalence between expert and lay descriptions. This dataset addresses the current lack of resources for training and evaluating Med-VLMs on lay-accessible medical image understanding.

Key Contribution

Current medical vision-language models can't explain medical images to patients, but MedLayBench-V offers a way to fix that.

Abstract

Medical Vision-Language Models (Med-VLMs) have achieved expert-level proficiency in interpreting diagnostic imaging. However, current models are predominantly trained on professional literature, limiting their ability to communicate findings in the lay register required for patient-centered care. While text-centric research has actively developed resources for simplifying medical jargon, there is a critical absence of large-scale multimodal benchmarks designed to facilitate lay-accessible medical image understanding. To bridge this resource gap, we introduce MedLayBench-V, the first large-scale multimodal benchmark dedicated to expert-lay semantic alignment. Unlike naive simplification approaches that risk hallucination, our dataset is constructed via a Structured Concept-Grounded Refinement (SCGR) pipeline. This method enforces strict semantic equivalence by integrating Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) with micro-level entity constraints. MedLayBench-V provides a verified foundation for training and evaluating next-generation Med-VLMs capable of bridging the communication divide between clinical experts and patients.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

Related Papers