Tsinghua AIBeijing Univ. Posts & Telecommun.Horizon RoboticsNanyang Technol. Univ.Apr 8, 2026arXiv:2604.06925

LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and Treatment

Fangyu Hao, Fangyu Hao, Jiayu Yang, Yifan Zhu, Zijun Yu, Zijun Yu, Qicen Wu, Wang Yunlong, Yunlong Wang, Jiawei Li, Yulin Liu, Xu Zeng, Guanting Chen, Shihao Li, Z. Ou, Zhonghong Ou, Meina Song, Mengyang Sun, Haoran Luo, Yu Shi, Yingyi Wang

AI Summary

The paper introduces LungCURE, a multimodal benchmark of 1,000 real-world lung cancer cases, to evaluate MLLMs on oncological precision treatment (OPT) tasks like TNM staging and treatment recommendation. It highlights the failure of existing MLLMs in guideline-constrained staging and treatment reasoning. To address this, they propose LCAgent, a multi-agent framework that enforces guideline compliance and reduces cascading reasoning errors, demonstrating improved performance in real-world medical scenarios.

Key Contribution

Current multimodal LLMs struggle with guideline-constrained clinical reasoning, but a simple multi-agent framework can significantly boost their performance on real-world lung cancer diagnosis and treatment.

Abstract

Lung cancer clinical decision support demands precise reasoning across complex, multi-stage oncological workflows. Existing multimodal large language models (MLLMs) fail to handle guideline-constrained staging and treatment reasoning. We formalize three oncological precision treatment (OPT) tasks for lung cancer, spanning TNM staging, treatment recommendation, and end-to-end clinical decision support. We introduce LungCURE, the first standardized multimodal benchmark built from 1,000 real-world, clinician-labeled cases across more than 10 hospitals. We further propose LCAgent, a multi-agent framework that ensures guideline-compliant lung cancer clinical decision-making by suppressing cascading reasoning errors across the clinical pathway. Experiments reveal large differences across various large language models (LLMs) in their capabilities for complex medical reasoning, when given precise treatment requirements. We further verify that LCAgent, as a simple yet effective plugin, enhances the reasoning performance of LLMs in real-world medical scenarios.

Eval Frameworks & Benchmarks Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and Treatment

Related Papers