ShanghaiTechJan 27, 2026arXiv:2601.19540

"Do I Trust the AI?"Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

Yuansong Xu, Yichao Zhu, Haokai Wang, Yuchen Wu, Ouyang Yang, Hanlu Li, Wenzhe Zhou, Xinyu Liu, Chang Jiang, Quan Li

AI Summary

This paper investigates physicians' perceptions of LLM capabilities in clinical reasoning to understand trust calibration in AI-assisted diagnosis. The study involved presenting clinical cases to physicians (N=37), collecting their evaluations of LLM-generated analyses, and comparing these perceptions with benchmark performance. The results reveal discrepancies between benchmark performance and physician-perceived value, highlighting the limitations of current evaluation metrics and informing strategies for building trustworthy LLM-physician collaboration.

Key Contribution

Physicians' trust in LLMs for diagnosis hinges on reasoning aspects not captured by standard benchmarks, revealing a critical gap in current evaluation practices.

Abstract

Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians'difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians'perceptions of LLMs'clinical reasoning capability. In this work, we investigate how physicians perceive LLMs'capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.

Citation Metrics

Citations0

Influential citations0

References112

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

"Do I Trust the AI?"Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

Related Papers