Search papers, labs, and topics across Lattice.
The paper introduces MedXIAOHE, a medical vision-language foundation model, designed for enhanced medical understanding and reasoning in clinical settings. They propose an entity-aware continual pretraining framework to address knowledge coverage and long-tail gaps, and incorporate reinforcement learning and tool-augmented agentic training for expert-level reasoning. The model integrates user-preference rubrics and evidence-grounded reasoning to improve reliability and reduce hallucinations.
MedXIAOHE leapfrogs closed-source systems on medical benchmarks by using entity-aware pretraining and RL-based reasoning, offering a new open-source foundation for medical AI.
We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.