Feb 18, 2026arXiv:2602.16245

HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis

J. Dhar, J. Dhar, M. K. Pandey, M. K. Pandey, D. Chakladar, D. Chakladar, M. Haghighat, M. Haghighat, A. Alavi, A. Alavi, S. Mistry, S. Mistry, N. Zaidi, N. Zaidi

AI Summary

The paper introduces HyPCA-Net, a novel multimodal fusion framework for medical image analysis designed to address limitations in computational cost and information loss present in existing methods. HyPCA-Net employs a residual adaptive learning attention block for modality-specific refinement and a dual-view cascaded attention block to learn robust shared representations across modalities. Experiments across ten datasets demonstrate that HyPCA-Net achieves up to 5.2% performance improvement and up to 73.1% reduction in computational cost compared to state-of-the-art methods.

Key Contribution

Achieve state-of-the-art results in multimodal medical image analysis with a model that's both more accurate and 73% faster than existing methods.

Abstract

Multimodal fusion frameworks, which integrate diverse medical imaging modalities (e.g., MRI, CT), have shown great potential in applications such as skin cancer detection, dementia diagnosis, and brain tumor prediction. However, existing multimodal fusion methods face significant challenges. First, they often rely on computationally expensive models, limiting their applicability in low-resource environments. Second, they often employ cascaded attention modules, which potentially increase risk of information loss during inter-module transitions and hinder their capacity to effectively capture robust shared representations across modalities. This restricts their generalization in multi-disease analysis tasks. To address these limitations, we propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net), composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing refined modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities. Extensive experiments on ten publicly available datasets exhibit that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost. Code: https://github.com/misti1203/HyPCA-Net.

Computer Vision Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis

Related Papers