Mar 3, 2026arXiv:2603.03187

ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

Chun-Wun Cheng, Yanqi Cheng, Peiyuan Jing, Guang Yang, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

AI Summary

The paper introduces ProSMA-UNet, a novel U-Net architecture for medical image segmentation that addresses the issue of irrelevant information propagation through skip connections by reformulating skip gating as a decoder-conditioned sparse feature selection problem. ProSMA-UNet uses a multi-scale compatibility field and an $\ell_1$ proximal operator with learnable thresholds to enforce sparsity in skip connections, effectively removing noisy responses and irrelevant channels. Experimental results on 2D and 3D medical image segmentation benchmarks demonstrate state-of-the-art performance, especially on challenging 3D tasks, with significant improvements over existing methods.

Key Contribution

Achieve ~20% gains on difficult 3D medical image segmentation by explicitly removing noisy activations in U-Net skip connections with a novel proximal-sparse attention mechanism.

Abstract

Medical image segmentation commonly relies on U-shaped encoder-decoder architectures such as U-Net, where skip connections preserve fine spatial detail by injecting high-resolution encoder features into the decoder. However, these skip pathways also propagate low-level textures, background clutter, and acquisition noise, allowing irrelevant information to bypass deeper semantic filtering -- an issue that is particularly detrimental in low-contrast clinical imaging. Although attention gates have been introduced to address this limitation, they typically produce dense sigmoid masks that softly reweight features rather than explicitly removing irrelevant activations. We propose ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem. ProSMA constructs a multi-scale compatibility field using lightweight depthwise dilated convolutions to capture relevance across local and contextual scales, then enforces explicit sparsity via an $\ell_1$ proximal operator with learnable per-channel thresholds, yielding a closed-form soft-thresholding gate that can remove noisy responses. To further suppress semantically irrelevant channels, ProSMA incorporates decoder-conditioned channel gating driven by global decoder context. Extensive experiments on challenging 2D and 3D benchmarks demonstrate state-of-the-art performance, with particularly large gains ($\approx20$\%) on difficult 3D segmentation tasks. Project page: https://math-ml-x.github.io/ProSMA-UNet/

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

Related Papers