ETHUZHMay 21, 2026arXiv:2605.22611

Benchmarking Machine Learning Architectures for Antimicrobial Stewardship in Pediatric ICUs

Niklas Raehse, Luregn J. Schlapbach, Daphné Chopard

AI Summary

This paper benchmarks machine learning models for predicting antimicrobial stewardship (AMS) interventions in pediatric ICUs, focusing on four targets: IV-to-oral switch, de-escalation, discontinuation, and short-course therapy. The study compares tabular, sequence-based, and graph-based temporal models across a public dataset and a private cohort, finding that predictive performance is primarily driven by target prevalence and dataset characteristics, not model complexity. Sequence models offer improved precision-recall at a 24-hour resolution, but at the cost of poorer calibration compared to simpler tabular models.

Key Contribution

Complex sequence models don't always outperform simpler tabular models for predicting antibiotic stewardship interventions in pediatric ICUs, and can even suffer from worse calibration.

Abstract

Antimicrobial stewardship (AMS) is critical in pediatric intensive care units (PICUs), where diagnostic uncertainty often drives broad-spectrum antibiotic use, increasing antimicrobial resistance and potential long-term harms. Machine learning offers a promising approach for identifying patient-level opportunities for stewardship interventions from electronic health record data, yet prior work has focused largely on adult populations and static tabular representations. We present a systematic benchmarking study of AMS intervention prediction in the PICU across a public dataset and a private institutional cohort. We define four clinically relevant proxy targets for reducing antibiotic exposure: intravenous-to-oral switching, de-escalation, discontinuation, and short-course therapy. Under a unified evaluation framework, we compare tabular, sequence-based, and graph-based temporal models at multiple temporal resolutions. We find that predictive performance is driven primarily by target prevalence and dataset characteristics rather than model complexity. Sequence models improve the precision-recall trade-off over tabular approaches at coarse (24-hour) resolution, while finer temporal modeling provides limited additional benefit. However, these gains come at the cost of poorer calibration, with simpler tabular models yielding more reliable probability estimates. Multi-task learning produces only marginal improvements, suggesting limited shared structure across stewardship targets. Our findings highlight the importance of target design, temporal representation, and calibration in clinical machine learning, and provide practical guidance for developing reliable decision support systems for pediatric AMS.

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Benchmarking Machine Learning Architectures for Antimicrobial Stewardship in Pediatric ICUs

Related Papers