FudanHUSTNorthwesternShanghai InnovationZJUApr 15, 2026arXiv:2604.13756

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

Zhijie Bao, Fangke Chen, Licheng Bao, Chenhui Zhang, Jiajie Peng

AI Summary

The paper introduces MedRCube, a novel multidimensional evaluation framework for MLLMs in medical imaging, designed to provide fine-grained insights into model performance and reasoning. They benchmarked 33 MLLMs, revealing limitations of existing evaluation metrics and highlighting the top performance of Lingshu-32B. A credibility evaluation subset exposed a significant positive correlation between shortcut behavior and diagnostic performance, raising concerns about clinical trustworthiness.

Key Contribution

MLLMs that excel at medical image diagnosis may be relying on shortcuts, undermining their trustworthiness for clinical deployment.

Abstract

The potential of Multimodal Large Language Models (MLLMs) in domain of medical imaging raise the demands of systematic and rigorous evaluation frameworks that are aligned with the real-world medical imaging practice. Existing practices that report single or coarse-grained metrics are lack the granularity required for specialized clinical support and fail to assess the reliability of reasoning mechanisms. To address this, we propose a paradigm shift toward multidimensional, fine-grained and in-depth evaluation. Based on a two-stage systematic construction pipeline designed for this paradigm, we instantiate it with MedRCube. We benchmark 33 MLLMs, \textit{Lingshu-32B} achieve top-tier performance. Crucially, MedRCube exposes a series of pronounced insights inaccessible under prior evaluation settings. Furthermore, we introduce a credibility evaluation subset to quantify reasoning credibility, uncover a highly significant positive association between shortcut behavior and diagnostic task performance, raising concerns for clinically trustworthy deployment. The resources of this work can be found at https://github.com/F1mc/MedRCube.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

Related Papers