Search papers, labs, and topics across Lattice.
This paper introduces Curriculum Goal-Conditioned Learning (CGCL), a novel training pipeline that guides LLMs to generate clinical diagnostic arguments structured according to the Toulmin model. CGCL uses a three-stage curriculum to progressively train LLMs to extract facts, justify hypotheses, and synthesize qualified conclusions. Validated with T-Eval, CGCL achieves diagnostic accuracy and reasoning quality comparable to RL methods, but with improved training stability and efficiency, addressing the critical need for transparent and reliable reasoning in healthcare applications of LLMs.
LLMs can now produce trustworthy clinical diagnoses by explicitly justifying their reasoning steps, rivaling resource-intensive RL methods with a more stable and efficient training approach.
The integration of Large Language Models (LLMs) into clinical decision support is critically obstructed by their opaque and often unreliable reasoning. In the high-stakes domain of healthcare, correct answers alone are insufficient; clinical practice demands full transparency to ensure patient safety and enable professional accountability. A pervasive and dangerous weakness of current LLMs is their tendency to produce "correct answers through flawed reasoning." This issue is far more than a minor academic flaw; such process errors signal a fundamental lack of robust understanding, making the model prone to broader hallucinations and unpredictable failures when faced with real-world clinical complexity. In this paper, we establish a framework for trustworthy clinical argumentation by adapting the Toulmin model to the diagnostic process. We propose a novel training pipeline: Curriculum Goal-Conditioned Learning (CGCL), designed to progressively train LLM to generate diagnostic arguments that explicitly follow this Toulmin structure. CGCL's progressive three-stage curriculum systematically builds a solid clinical argument: (1) extracting facts and generating differential diagnoses; (2) justifying a core hypothesis while rebutting alternatives; and (3) synthesizing the analysis into a final, qualified conclusion. We validate CGCL using T-Eval, a quantitative framework measuring the integrity of the diagnosis reasoning. Experiments show that our method achieves diagnostic accuracy and reasoning quality comparable to resource-intensive Reinforcement Learning (RL) methods, while offering a more stable and efficient training pipeline.