Corresponding authors: Yixuan Yuan ()CUHKMar 12, 2026arXiv:2603.11625

MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models

Shengyuan Liu, Zanting Ye, Yunrui Lin, Yun-Hsuan Lin, Cheng Hu, Wanting Geng, Xu Han, B. Ibragimov, Yefeng Zheng, Yixuan Yuan

AI Summary

MedPruner, a training-free hierarchical token pruning framework, is introduced to address computational inefficiencies in 3D medical VLMs caused by anatomical redundancy and fixed pruning ratios. It employs an Inter-slice Anchor-based Filtering module to remove slice-level redundancy, followed by Dynamic Information Nucleus Selection based on cumulative attention weights for adaptive token-level compression. Experiments across three 3D medical benchmarks and VLMs show MedPruner can maintain or improve performance with as little as 5% of the original visual tokens, significantly reducing computational costs.

Key Contribution

Achieve up to 20x speedup in 3D medical VLMs without sacrificing accuracy by pruning away 95% of visual tokens *without* retraining.

Abstract

While specialized Medical Vision-Language Models (VLMs) have achieved remarkable success in interpreting 2D and 3D medical modalities, their deployment for 3D volumetric data remains constrained by significant computational inefficiencies. Current architectures typically suffer from massive anatomical redundancy due to the direct concatenation of consecutive 2D slices and lack the flexibility to handle heterogeneous information densities across different slices using fixed pruning ratios. To address these challenges, we propose MedPruner, a training-free and model-agnostic hierarchical token pruning framework specifically designed for efficient 3D medical image understanding. MedPruner introduces a two-stage mechanism: an Inter-slice Anchor-based Filtering module to eliminate slice-level temporal redundancy, followed by a Dynamic Information Nucleus Selection strategy that achieves adaptive token-level compression by quantifying cumulative attention weights. Extensive experiments on three 3D medical benchmarks and across three diverse medical VLMs reveal massive token redundancy in existing architectures. Notably, MedPruner enables models such as MedGemma to maintain or even exceed their original performance while retaining fewer than 5% of visual tokens, thereby drastically reducing computational overhead and validating the necessity of dynamic token selection for practical clinical deployment. Our code will be released.

Computer Vision Inference & Quantization Multimodal Models

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models

Related Papers