Feb 16, 2026arXiv:2602.14498

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

Aryan Das, Aryan Das, Tanishq Rachamalla, Tanishq Rachamalla, Koushik Biswas, Koushik Biswas, S. K. Roy, Swalpa Kumar Roy, V. Verma, Vinay Kumar Verma

AI Summary

This paper introduces an uncertainty-aware vision-language segmentation framework for medical imaging that fuses radiological images and clinical text. The framework employs a Modality Decoding Attention Block (MoDAB) with a State Space Mixer (SSMix) for efficient cross-modal fusion and long-range dependency modeling. A Spectral-Entropic Uncertainty (SEU) Loss is proposed to guide learning under ambiguity by capturing spatial overlap, spectral consistency, and predictive uncertainty, leading to improved segmentation performance and computational efficiency on QATA-COVID19, MosMed++, and Kvasir-SEG datasets.

Key Contribution

Achieve superior medical image segmentation by explicitly modeling uncertainty and efficiently fusing vision and language modalities.

Abstract

We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References43

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

Related Papers