Verily Health IncApr 29, 2026arXiv:2604.27045

Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture

Samuel L Pugh, Eric Yang, Alexander Muir Sutherland, Alessandra Breschi

AI Summary

This paper introduces a Dual-Stream Memory Architecture for health coaching agents that separates patient narratives from structured EHR data (FHIR) and uses a Reconciliation Engine to detect clinical discrepancies. The architecture aims to address the challenge of reconciling potentially conflicting information from patient self-reports and stale EHR data in longitudinal healthcare journeys. Experiments on a hybrid dataset showed the engine detects 84.4% of designed clinical discrepancies with 86.7% safety-critical recall, revealing a 13.6% error cascade originating from memory extraction.

Key Contribution

LLM-powered health coaching agents can now detect and flag discrepancies between patient-reported information and their official medical records, paving the way for safer and more reliable longitudinal care.

Abstract

As Large Language Model (LLM) agents transition from single-session tools to persistent systems managing longitudinal healthcare journeys, their memory architectures face a critical challenge: reconciling two imperfect sources of truth. The patient's evolving self-report is current but prone to recall bias, while the Electronic Health Record (EHR) is medically validated but frequently stale. General-purpose agent memory systems optimize for coherence by overwriting older facts with the user's latest statement, a pattern that risks safety failures when applied to clinical data. We introduce a Dual-Stream Memory Architecture that strictly separates the patient narrative from the structured clinical record (FHIR), governed by a dedicated Reconciliation Engine that evaluates every extracted memory against the patient's FHIR profile and classifies discrepancies by type, severity, and the specific FHIR resources involved. We evaluate this architecture on 26 patients across 675 longitudinal wellness coaching sessions, using a hybrid dataset that interleaves real provider-patient transcripts with synthetic, FHIR-grounded clinical scenarios. In isolated testing, the engine detects 84.4% of designed clinical discrepancies with 86.7% safety-critical recall. By coupling extraction and reconciliation evaluation on the same data, we directly quantify a 13.6% error cascade, tracing the degradation to clinical details lost during memory extraction from unstructured conversation rather than to downstream classification errors. These findings establish that validating patient-reported memories against clinical records is both feasible and necessary for safe deployment of longitudinal health agents.

Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture

Related Papers