UCLUniversity of LiverpoolXi'an Jiaotong-Liverpool UniversityXi’an Jiaotong-Liverpool UniversityApr 7, 2026arXiv:2604.05620

Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening

Chenyu Xue, Chenyu Xue, Yiran Liu, Mian Zhou, Jionglong Su, Zhixiang Lu

AI Summary

This paper introduces Semantic-Topological Graph Reasoning (STGR), a framework that combines LLaMA-3-V and MedSAM for language-guided pulmonary screening. STGR uses a Text-to-Vision Intent Distillation (TVID) module to extract diagnostic guidance and formulates mask selection as a dynamic graph reasoning problem to resolve anatomical ambiguity. By employing a Selective Asymmetric Fine-Tuning (SAFT) strategy, the framework achieves state-of-the-art performance with an 81.5% Dice Similarity Coefficient (DSC) on LIDC-IDRI, surpassing existing LLM-based tools while maintaining cross-fold stability.

Key Contribution

Achieve state-of-the-art pulmonary nodule segmentation by distilling language guidance into a graph-reasoning framework, all while fine-tuning less than 1% of the parameters.

Abstract

Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariably leads to severe overfitting. To address these challenges, we propose a novel Semantic-Topological Graph Reasoning (STGR) framework for language-guided pulmonary screening. Our approach elegantly synergizes the reasoning capabilities of large language models (LLaMA-3-V) with the zero-shot delineation of vision foundation models (MedSAM). Specifically, we introduce a Text-to-Vision Intent Distillation (TVID) module to extract precise diagnostic guidance. To resolve anatomical ambiguity, we formulate mask selection as a dynamic graph reasoning problem, where candidate lesions are modeled as nodes and edges capture spatial and semantic affinities. To ensure deployment feasibility, we introduce a Selective Asymmetric Fine-Tuning (SAFT) strategy that updates less than 1% of the parameters. Rigorous 5-fold cross-validation on the LIDC-IDRI and LNDb datasets demonstrates that our framework establishes a new state-of-the-art. Notably, it achieves an 81.5% Dice Similarity Coefficient (DSC) on LIDC-IDRI, outperforming leading LLM-based tools like LISA by over 5%. Crucially, our SAFT strategy acts as a powerful regularizer, yielding exceptional cross-fold stability (0.6% DSC variance) and paving the way for robust, context-aware clinical deployment.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening

Related Papers