Search papers, labs, and topics across Lattice.
This paper introduces Deep Reprogramming Distillation (DRD), a novel framework for adapting medical foundation models to specific downstream tasks by using a reprogramming module to bridge the gap between pre-training and downstream scenarios. DRD further employs centered kernel alignment (CKA) distillation to enhance the robustness of knowledge transfer under varying training conditions. Experiments across 18 medical tasks, including 2D/3D classification and segmentation, demonstrate that DRD outperforms existing PEFT and KD methods.
Forget PEFT and KD, reprogramming distillation offers a surprisingly effective and robust way to adapt large medical foundation models to diverse downstream tasks.
Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.