University of Science and TechnologyMar 18, 2026arXiv:2603.17718

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

Yuhe Tian, Kun Zhang, Haoran Ma, Rui Yan, Yingtai Li, Rongsheng Wang, Shaohua Kevin Zhou

AI Summary

The paper introduces Differential Visual Prompting (DiffVP) for CT report generation, a method that leverages semantic differences between a patient's scan and a reference scan to focus the LLM on diagnostically relevant information. DiffVP uses a hierarchical difference extractor to capture global and local semantic discrepancies, which are then transformed into learnable visual prefix tokens to condition the LLM. Experiments on two large-scale benchmarks demonstrate that DiffVP significantly outperforms existing methods, improving BLEU scores and clinical efficacy.

Key Contribution

By focusing on semantic differences between scans, DiffVP lets LLMs generate more accurate CT reports without needing explicit lesion localization.

Abstract

While large language models (LLMs) have advanced CT report generation, existing methods typically encode 3D volumes holistically, failing to distinguish informative cues from redundant anatomical background. Inspired by radiological cognitive subtraction, we propose Differential Visual Prompting (DiffVP), which conditions report generation on explicit, high-level semantic scan-to-reference differences rather than solely on absolute visual features. DiffVP employs a hierarchical difference extractor to capture complementary global and local semantic discrepancies into a shared latent space, along with a difference-to-prompt generator that transforms these signals into learnable visual prefix tokens for LLM conditioning. These difference prompts serve as structured conditioning signals that implicitly suppress invariant anatomy while amplifying diagnostically relevant visual evidence, thereby facilitating accurate report generation without explicit lesion localization. On two large-scale benchmarks, DiffVP consistently outperforms prior methods, improving the average BLEU-1-4 by +10.98 and +4.36, respectively, and further boosts clinical efficacy on RadGenome-ChestCT (F1 score 0.421). All codes will be released at https://github.com/ArielTYH/DiffVP/.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

Related Papers