Search papers, labs, and topics across Lattice.
The authors introduce IOSVLM, a 3D vision-language model (VLM) designed for unified dental diagnosis directly from intraoral scans (IOS) represented as point clouds. To address challenges like heterogeneous scan forms and limited paired data, they propose a geometry-to-chromatic proxy to bridge the gap between color-free IOS data and color-dependent 3D pre-training, along with a two-stage curriculum training strategy. Evaluated on a newly created large-scale IOS diagnosis VQA dataset (IOSVQA), IOSVLM demonstrates significant performance gains over existing methods, achieving at least +9.58% macro accuracy and +1.46% macro F1.
Directly modeling 3D geometry in dental scans unlocks a 9.58% accuracy boost in multi-disease diagnosis compared to methods relying on 2D or multi-view image representations.
3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.