Search papers, labs, and topics across Lattice.
This study introduces a paired acoustic stress test to evaluate the impact of various noise types on the performance of ambient clinical scribes that utilize Automatic Speech Recognition and Large Language Models. The findings reveal a concerning disconnect between traditional metrics like Word Error Rate and actual clinical safety, as minor acoustic disturbances can significantly increase the rate of unsafe outputs despite only marginally affecting error rates. Importantly, the authors propose a lightweight mitigation strategy that effectively reduces safety risks in noisy environments without necessitating model fine-tuning.
Minor acoustic noise can nearly double the rate of unsafe outputs in clinical documentation, despite only a slight increase in Word Error Rate.
Ambient clinical scribes increasingly combine Automatic Speech Recognition with Large Language Models to automate documentation. However, traditional metrics like Word Error Rate mask systemic safety degradation. We present a paired acoustic stress test to isolate the causal impact of noise on clinical reasoning. For the same dialogues, we inject diverse noise types while keeping the downstream model configuration frozen. Crucially, we uncover a dangerous disconnect between signal fidelity and clinical safety. Stationary ambient noise increased the Word Error Rate by a negligible 0.71 percentage points yet nearly doubled the rate of unsafe outputs. Our analysis reveals that minor acoustic perturbations can invert clinical meaning without substantially inflating error rates. Furthermore, we demonstrate a lightweight mitigation strategy that mitigates safety degradation under noisy conditions without requiring model fine tuning.