Search papers, labs, and topics across Lattice.
The paper introduces DCAU-Net, a novel medical image segmentation framework designed to improve accuracy and efficiency by addressing limitations in both transformer-based and CNN-based approaches. It employs a Differential Cross Attention (DCA) mechanism that computes the difference between softmax attention maps using window-level tokens to reduce computational complexity while highlighting discriminative structures. Additionally, a Channel-Spatial Feature Fusion (CSFF) strategy adaptively recalibrates features from skip connections using sequential channel and spatial attention.
By computing the *difference* between attention maps, DCAU-Net achieves state-of-the-art medical image segmentation while dramatically reducing computational cost compared to standard self-attention.
Accurate medical image segmentation requires effective modeling of both long-range dependencies and fine-grained boundary details. While transformers mitigate the issue of insufficient semantic information arising from the limited receptive field inherent in convolutional neural networks, they introduce new challenges: standard self-attention incurs quadratic computational complexity and often assigns non-negligible attention weights to irrelevant regions, diluting focus on discriminative structures and ultimately compromising segmentation accuracy. Existing attention variants, although effective in reducing computational complexity, fail to suppress redundant computation and inadvertently impair global context modeling. Furthermore, conventional fusion strategies in encoder-decoder architectures, typically based on simple concatenation or summation, can not adaptively integrate high-level semantic information with low-level spatial details. To address these limitations, we propose DCAU-Net, a novel yet efficient segmentation framework with two key ideas. First, a new Differential Cross Attention (DCA) is designed to compute the difference between two independent softmax attention maps to adaptively highlight discriminative structures. By replacing pixel-wise key and value tokens with window-level summary tokens, DCA dramatically reduces computational complexity without sacrificing precision. Second, a Channel-Spatial Feature Fusion (CSFF) strategy is introduced to adaptively recalibrate features from skip connections and up-sampling paths through using sequential channel and spatial attention, effectively suppressing redundant information and amplifying salient cues. Experiments on two public benchmarks demonstrate that DCAU-Net achieves competitive performance with enhanced segmentation accuracy and robustness.