Search papers, labs, and topics across Lattice.
This paper introduces MedVCR, a counterfactual reasoning framework for medical video diagnosis that addresses limitations of existing end-to-end methods by incorporating clinical priors and counterfactual comparisons. MedVCR uses a diffusion-based Counterfactual Generator to synthesize tissue evolution under different pathological states, a Counterfactual Representation Learning module to encode diagnostic knowledge through clinical rules, and a Dual Diagnostic Prediction strategy. Experiments on colposcopy and colonoscopy video diagnosis demonstrate performance gains of 2.6%-10.2% over state-of-the-art baselines, validating the effectiveness of the proposed approach.
Mimicking clinical diagnostic thinking with counterfactual reasoning boosts medical video diagnosis accuracy by up to 10.2%.
Medical video diagnosis involves inferring clinical decisions from dynamic tissue responses throughout examination processes. Existing methods rely on an end-to-end learning paradigm that i) focuses on appearance rather than pathology, ii) lacks clinical priors, and iii) reasons solely from observations without counterfactual comparison. This work introduces MedVCR, a counterfactual reasoning framework that mimics clinical diagnostic thinking. MedVCR comprises three components: a Counterfactual Generator that synthesizes tissue evolution under specified pathological states via a diffusion-based manner; a Counterfactual Representation Learning module that encodes diagnostic knowledge through clinical rules (i.e., temporal consistency, pathological separability, and counterfactual alignment); and a Dual Diagnostic Prediction strategy that integrates video-level assessment with frame-level counterfactual analysis. MedVCR is evaluated under both fully supervised (e.g., colposcopy) and weakly supervised (e.g., colonoscopy) video diagnosis settings, yielding 2.6%-10.2% performance gains compared with leading baselines. Comprehensive ablation studies further validate the effectiveness of each component. The code will be released.