NVIDIAAI MaticsChung-AngDankook UniversityEwha Womans UniversitySangmyung UniversitySNUJan 18, 2026arXiv:2604.00514

MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Kyeonghun Kim, Hye-Won Jung, Y. Han, Junsu Lim, Yeonju Jean, Seongbin Park, E. Choi, Hyunsu Go, Seoyoung Ju, Seohyoung Park, Gyeongmin Kim, Min-Jin Kwon, Kyungseok Yuh, Soo Yong Kim, K. Liao, N. Kim, Hyuk-Jae Lee

AI Summary

The paper introduces MAESIL, a novel masked autoencoder framework for self-supervised learning on 3D medical images that addresses the limitations of existing methods that treat 3D volumes as independent 2D slices. MAESIL uses a "superpatch" approach, partitioning the volume into 3D chunks and employing a dual-masking strategy within a 3D masked autoencoder to better capture spatial relationships. Experiments on three CT datasets demonstrate that MAESIL significantly improves reconstruction metrics like PSNR and SSIM compared to AE, VAE, and VQ-VAE, establishing it as a strong pre-training method.

Key Contribution

Medical imaging AI can now leverage a self-supervised pre-training method that understands 3D context, boosting reconstruction quality beyond what's possible with 2D-centric approaches.

Abstract

Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the ‘superpatch,’ a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dualmasking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.

Computer Vision Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References18

Year2026

VenueInternational Conference on Electronics, Information and Communications

Related Papers

Finding related papers...

Search

MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Related Papers