Graphic Era UniversityIITIIT DelhiNIMS University JaipurPurdueFeb 18, 2026arXiv:2602.16320

RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

Vishwas Rathi, Kavyansh Tyagi, Puneet Goyal, Vishwas Rathi, Puneet Goyal

AI Summary

The paper introduces RefineFormer3D, a lightweight 3D transformer architecture for medical image segmentation designed to balance accuracy and computational efficiency. It utilizes GhostConv3D for patch embedding, a MixFFN3D module with low-rank projections, and a cross-attention fusion decoder for adaptive multi-scale skip connections. Experiments on ACDC and BraTS datasets show that RefineFormer3D achieves competitive Dice scores (93.44% and 85.9% respectively) with only 2.94M parameters and fast inference times, outperforming or matching state-of-the-art methods with significantly fewer parameters.

Key Contribution

Achieve state-of-the-art 3D medical image segmentation with a transformer architecture that's both accurate and lightweight, boasting significantly fewer parameters and faster inference times than existing methods.

Abstract

Accurate and computationally efficient 3D medical image segmentation remains a critical challenge in clinical workflows. Transformer-based architectures often demonstrate superior global contextual modeling but at the expense of excessive parameter counts and memory demands, restricting their clinical deployment. We propose RefineFormer3D, a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for volumetric medical imaging. The architecture integrates three key components: (i) GhostConv3D-based patch embedding for efficient feature extraction with minimal redundancy, (ii) MixFFN3D module with low-rank projections and depthwise convolutions for parameter-efficient feature extraction, and (iii) a cross-attention fusion decoder enabling adaptive multi-scale skip connection integration. RefineFormer3D contains only 2.94M parameters, substantially fewer than contemporary transformer-based methods. Extensive experiments on ACDC and BraTS benchmarks demonstrate that RefineFormer3D achieves 93.44\% and 85.9\% average Dice scores respectively, outperforming or matching state-of-the-art methods while requiring significantly fewer parameters. Furthermore, the model achieves fast inference (8.35 ms per volume on GPU) with low memory requirements, supporting deployment in resource-constrained clinical environments. These results establish RefineFormer3D as an effective and scalable solution for practical 3D medical image segmentation.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

Related Papers