Search papers, labs, and topics across Lattice.
The authors introduce CT-Bench, a new benchmark dataset for multimodal lesion understanding in CT scans, comprising 20,335 lesions with bounding boxes, descriptions, and size information, along with a visual question answering (VQA) benchmark of 2,850 QA pairs. They evaluate state-of-the-art vision-language models, including medical CLIP variants, on CT-Bench, demonstrating its utility for assessing lesion analysis capabilities and highlighting the challenges posed by hard negative examples. Fine-tuning models on the lesion image and metadata set significantly improves performance on both the lesion image and VQA tasks, demonstrating the dataset's value for improving clinical utility.
CT-Bench reveals that even state-of-the-art multimodal models struggle with lesion understanding in CT scans, highlighting the need for specialized datasets and fine-tuning to bridge the gap between AI and radiologist performance.
Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstrating the value of CT-Bench as a comprehensive benchmark for lesion analysis. Moreover, fine-tuning models on the Lesion Image and Metadata Set yields significant performance gains across both components, underscoring the clinical utility of CT-Bench.