Tsinghua AIHangzhou Dianzi UniversityUChicagoZJUMay 21, 2026arXiv:2605.22619

GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT

Yuhao Hong, Chunbo Jiang, Weihong Chen, Huangwei Chen, Shenghao Zhu, Beining Wu, Mingxuan Liu, Zhu Zhu, Feiwei Qin, Min Tan, Yifei Chen

AI Summary

This paper introduces GLeVE, a graph-guided framework for grounding radiology reports to 3D CT volumes by treating each lesion description as an atomic semantic unit. GLeVE uses relation-aware graph reasoning to encode organ attribution, attributes, and inter-lesion relations, combined with anatomy-aware proposal generation and octree-based refinement. Experiments on AbdomenAtlas 3.0 show that GLeVE achieves improved segmentation accuracy and lesion-level localization compared to existing methods.

Key Contribution

Achieve verifiable clinical interpretation by grounding radiology reports to 3D CT volumes with a novel graph-guided lesion grounding framework that outperforms existing multimodal foundation models.

Abstract

Grounding radiology report descriptions to 3D CT volumes is essential for verifiable clinical interpretation, yet remains challenging due to the semantic-spatial gap between free-text narratives and volumetric anatomy. Existing report-assisted and vision-language grounding methods typically rely on phrase-level alignment or dense pixel supervision, resulting in limited lesion-wise correspondence and suboptimal localization accuracy. We propose GLeVE, a graph-guided lesion grounding framework with anatomical prior verification and octree-based autoregressive refinement. GLeVE treats each lesion description as an atomic semantic unit and encodes organ attribution, attributes, and inter-lesion relations through relation-aware graph reasoning to produce discriminative lesion-wise queries. Anatomy-aware proposal generation with region-level verification enforces one-to-one text-lesion alignment, while hierarchical octree refinement progressively improves boundary delineation. Experiments on AbdomenAtlas 3.0 demonstrate consistent gains over classical multimodal foundation models and report-supervised baselines in both segmentation accuracy and lesion-level localization.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT

Related Papers