Feb 26, 2026arXiv:2602.22973

Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

Dimitrios P. Panagoulias, D. Panagoulias, Evangelia-Aikaterini Tsichrintzi, Evangelia-Aikaterini Tsichrintzi, Georgios Savvidis, G. Savvidis, Evridiki Tsoureli-Nikita, Evridiki Tsoureli-Nikita

AI Summary

This paper introduces a diagnostic alignment framework that preserves AI-generated image-based reports as immutable inference states and compares them with physician-validated outcomes to model expert AI diagnostic alignment. The inference pipeline uses a vision-enabled LLM, BERT-based medical entity extraction, and Sequential Language Model Inference (SLMI) to refine reports before expert review. Evaluation on 21 dermatological cases using a four-level concordance framework showed a 71.4% exact agreement, which remained stable under semantic similarity adjustments, and 100% comprehensive concordance, demonstrating that binary lexical evaluation underestimates clinically meaningful alignment.

Key Contribution

Current binary lexical evaluations severely underestimate the clinically meaningful alignment between AI and expert physician diagnoses, as demonstrated by a novel framework achieving 100% comprehensive concordance in dermatological cases despite initial lexical disagreement.

Abstract

Human-in-the-loop validation is essential in safety-critical clinical AI, yet the transition between initial model inference and expert correction is rarely analyzed as a structured signal. We introduce a diagnostic alignment framework in which the AI-generated image based report is preserved as an immutable inference state and systematically compared with the physician-validated outcome. The inference pipeline integrates a vision-enabled large language model, BERT- based medical entity extraction, and a Sequential Language Model Inference (SLMI) step to enforce domain-consistent refinement prior to expert review. Evaluation on 21 dermatological cases (21 complete AI physician pairs) em- ployed a four-level concordance framework comprising exact primary match rate (PMR), semantic similarity-adjusted rate (AMR), cross-category alignment, and Comprehensive Concordance Rate (CCR). Exact agreement reached 71.4% and remained unchanged under semantic similarity (t = 0.60), while structured cross-category and differential overlap analysis yielded 100% comprehensive concordance (95% CI: [83.9%, 100%]). No cases demonstrated complete diagnostic divergence. These findings show that binary lexical evaluation substantially un- derestimates clinically meaningful alignment. Modeling expert validation as a structured transformation enables signal-aware quantification of correction dynamics and supports traceable, human aligned evaluation of image based clinical decision support systems.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

Related Papers