Search papers, labs, and topics across Lattice.
This paper introduces a framework for transparently reporting data quality (DQ) assessments across the clinical electronic health record (EHR) data lifecycle to improve trust and adoption of clinical AI models. The framework identifies key lifecycle phases and actors involved in both data-generating and data-receiving organizations, enabling the mapping of DQ parameters to specific stages. Applying the framework to a real-world dataset demonstrated its utility in pinpointing the origins of DQ issues, thereby enhancing data provenance understanding.
Untangling the clinical data lifecycle reveals where data quality issues originate, boosting trust in AI models built on EHR data.
Data quality (DQ) and transparency of secondary data are critical factors that delay the adoption of clinical AI models and affect clinician trust in them. Many DQ studies fail to clarify where, along the lifecycle, quality checks occur, leading to uncertainty about provenance and fitness for reuse. This study develops a framework for transparent reporting of DQ assessments across the clinical electronic health record (EHR) data lifecycle. The reporting framework was developed through iterative analysis to identify actors and phases of the clinical data lifecycle. The framework distinguishes between data-generating organizations and data-receiving organizations to allow users to map DQ parameters to stages across the data lifecycle. The framework defines 5 key lifecycle phases and multiple actors. When applied to the real-world dataset, the framework demonstrated applicability in revealing where DQ issues may originate. The framework provides a structured approach for reporting DQ assessments, which can enhance transparency regarding data fitness for reuse, supporting reliable clinical research, AI model development, and internal organisational governance. This work provides practical guidance for researchers to understand data provenance and for organisations to target DQ improvement efforts across the data lifecycle.