Search papers, labs, and topics across Lattice.
This paper introduces UniReason-Med, a unified framework for enhancing 3D medical visual question answering (VQA) by leveraging grounded reasoning supervision from 2D medical images. By employing a shared reasoning interface and a novel dataset, UniMed-CoT, which includes both 2D and 3D samples, the authors demonstrate that joint training significantly improves 3D reasoning capabilities compared to traditional 3D-only approaches. The results reveal that the integration of 2D and 3D data through region-token injection and a common grounded reasoning policy leads to superior performance in medical VQA tasks.
Grounded reasoning from 2D images can dramatically enhance 3D medical VQA performance, revealing the power of cross-dimensional learning.
We study whether grounded reasoning supervision from abundant 2D medical images can improve 3D medical VQA when both input types are aligned through a common reasoning interface. We introduce UniReason-Med, a single-checkpoint framework that processes either a 2D image or a slice-serialized 3D volume at inference time, generating interleaved textual reasoning and localized visual evidence through shared box syntax, region-token injection, and a common grounded reasoning policy. To train this interface, we construct UniMed-CoT, a 220K instruction-tuning dataset with interleaved textual reasoning and grounded visual evidence, including 170K 2D and 50K 3D samples. Through supervised fine-tuning followed by outcome-level reinforcement learning, UniReason-Med learns to generate grounded reasoning traces without IoU/Dice-based localization rewards during RL. Data-mixture and component ablations show that joint 2D+3D grounded supervision substantially improves 3D reasoning over 3D-only training, while grounding and region-token injection consistently benefit both 2D and 3D tasks. These results suggest that a shared grounded reasoning interface can transfer reasoning structure from 2D images to slice-serialized volumetric medical understanding. The code and data are publicly available at https://github.com/IQuestLab/unireason-med.