Search papers, labs, and topics across Lattice.
The paper introduces a two-stage framework, ClinAlign, to align LLM outputs with clinician preferences by first creating HealthRubrics, a dataset of physician-verified preference examples, and then distilling these into HealthPrinciples, a set of reusable, clinically grounded principles. This approach enables scalable supervision for LLMs in healthcare by synthesizing rubrics for unlabeled queries and guiding self-revision at inference time. The authors demonstrate that a 30B-A3B model trained with ClinAlign achieves a 33.4% score on HealthBench-Hard, surpassing larger models and establishing a resource-efficient baseline for clinical alignment.
Forget RLHF, a new framework distills clinician preferences into reusable "HealthPrinciples" that let smaller models outperform giants on medical benchmarks.
Although large language models (LLMs) demonstrate expert-level medical knowledge, aligning their open-ended outputs with fine-grained clinician preferences remains challenging. Existing methods often rely on coarse objectives or unreliable automated judges that are weakly grounded in professional guidelines. We propose a two-stage framework to address this gap. First, we introduce HealthRubrics, a dataset of 7,034 physician-verified preference examples in which clinicians refine LLM-drafted rubrics to meet rigorous medical standards. Second, we distill these rubrics into HealthPrinciples: 119 broadly reusable, clinically grounded principles organized by clinical dimensions, enabling scalable supervision beyond manual annotation. We use HealthPrinciples for (1) offline alignment by synthesizing rubrics for unlabeled queries and (2) an inference-time tool for guided self-revision. A 30B-A3B model trained with our framework achieves 33.4% on HealthBench-Hard, outperforming much larger models including Deepseek-R1 and o3, establishing a resource-efficient baseline for clinical alignment.