NIHFeb 16, 2026arXiv:2602.14879

CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography

Qingqing Zhu, Qiao Jin, Tejas S. Mathai, Maame Sarfo-Gyamfi, Benjamin Hou, Ran Gu, Praveen T. S. Balamuralikrishna, Kenneth C. Wang, Ronald M. Summers, Zhiyong Lu

AI Summary

The authors introduce CT-Bench, a new benchmark dataset for multimodal lesion understanding in CT scans, comprising 20,335 lesions with bounding boxes, descriptions, and size information, along with a visual question answering (VQA) benchmark of 2,850 QA pairs. They evaluate state-of-the-art vision-language models, including medical CLIP variants, on CT-Bench, demonstrating its utility for assessing lesion analysis capabilities and highlighting the challenges posed by hard negative examples. Fine-tuning models on the lesion image and metadata set significantly improves performance on both the lesion image and VQA tasks, demonstrating the dataset's value for improving clinical utility.

Key Contribution

CT-Bench reveals that even state-of-the-art multimodal models struggle with lesion understanding in CT scans, highlighting the need for specialized datasets and fine-tuning to bridge the gap between AI and radiologist performance.

Abstract

Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstrating the value of CT-Bench as a comprehensive benchmark for lesion analysis. Moreover, fine-tuning models on the Lesion Image and Metadata Set yields significant performance gains across both components, underscoring the clinical utility of CT-Bench.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography

Related Papers