Google ResearchOxfordMar 10, 2026arXiv:2603.09356

Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

Soheila Molaei, P. Nganjimi, Joshua Fieggen, Danielle Belgrave, Lei Clifton, David A. Clifton

AI Summary

This paper introduces a differentially private, zero-order optimization framework to extend dataset condensation (DC) to non-differentiable clinical models like decision trees and Cox regression. The method uses only function evaluations to learn a compact, synthetic dataset that preserves model utility while providing differential privacy guarantees. Experiments across six clinical datasets demonstrate that the condensed datasets enable model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.

Key Contribution

Dataset condensation, previously limited to neural networks, can now democratize access to clinical data by enabling privacy-preserving training of classical models like decision trees and Cox regression.

Abstract

Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare data democratisation, especially when paired with differential privacy, allowing synthetic data to serve as a safe alternative to real records. However, existing DC methods rely on differentiable neural networks, limiting their compatibility with widely used clinical models such as decision trees and Cox regression. We address this gap using a differentially private, zero-order optimisation framework that extends DC to non-differentiable models using only function evaluations. Empirical results across six datasets, including both classification and survival tasks, show that the proposed method produces condensed datasets that preserve model utility while providing effective differential privacy guarantees - enabling model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.

Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

Related Papers