Mar 17, 2026arXiv:2603.16738

MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

AI Summary

The authors introduce MedCL-Bench, a new benchmark for evaluating continual learning (CL) strategies in biomedical NLP, addressing the lack of standardized evaluation for catastrophic forgetting in this domain. They evaluated eleven CL strategies across ten datasets and eight task orders, measuring retention, transfer, and GPU-hour cost. Their results show that direct sequential fine-tuning leads to catastrophic forgetting, and different CL methods offer distinct retention-compute trade-offs, with parameter isolation being the most efficient and replay offering strong protection at a higher cost.

Key Contribution

Biomedical language models suffer severe catastrophic forgetting when sequentially updated, but parameter isolation offers the best retention per GPU-hour, revealing a crucial efficiency-stability tradeoff.

Abstract

Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although biomedical NLP has many static benchmarks, no unified, task-diverse benchmark exists for evaluating continual learning under standardized protocols, robustness to task order and compute-aware reporting. We introduce MedCL-Bench, which streams ten biomedical NLP datasets spanning five task families and evaluates eleven continual learning strategies across eight task orders, reporting retention, transfer, and GPU-hour cost. Across backbones and task orders, direct sequential fine-tuning on incoming tasks induces catastrophic forgetting, causing update-induced performance regressions on prior tasks. Continual learning methods occupy distinct retention-compute frontiers: parameter-isolation provides the best retention per GPU-hour, replay offers strong protection at higher cost, and regularization yields limited benefit. Forgetting is task-dependent, with multi-label topic classification most vulnerable and constrained-output tasks more robust. MedCL-Bench provides a reproducible framework for auditing model updates before deployment.

Eval Frameworks & Benchmarks Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

Related Papers