Mar 30, 2026arXiv:2603.28297

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng

AI Summary

The paper introduces DinoDental, a benchmark to evaluate the transferability of the DINOv3 self-supervised vision model to dental image analysis tasks, including classification, detection, and segmentation. They systematically analyze DINOv3's performance across panoramic radiographs and intraoral photographs, varying model size, input resolution, and adaptation strategies like fine-tuning and LoRA. Results demonstrate DINOv3's effectiveness as a unified encoder, particularly for intraoral image understanding and boundary-sensitive dense prediction, establishing a baseline for dental AI research.

Key Contribution

DINOv3, a vision foundation model trained on general images, surprisingly excels at dental image analysis, especially for the notoriously difficult task of intraoral image understanding.

Abstract

The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability when transferred to the dental domain, with its unique imaging characteristics and clinical subtleties, remains unclear. To address this, we introduce DinoDental, a unified benchmark designed to systematically evaluate whether DINOv3 can serve as a reliable, off-the-shelf encoder for comprehensive dental image analysis without requiring domain-specific pre-training. Constructed from multiple public datasets, DinoDental covers a wide range of tasks, including classification, detection, and instance segmentation on both panoramic radiographs and intraoral photographs. We further analyze the model's transfer performance by scaling its size and input resolution, and by comparing different adaptation strategies, including frozen features, full fine-tuning, and the parameter-efficient Low-Rank Adaptation (LoRA) method. Our experiments show that DINOv3 can serve as a strong unified encoder for dental image analysis across both panoramic radiographs and intraoral photographs, remaining competitive across tasks while showing particularly clear advantages for intraoral image understanding and boundary-sensitive dense prediction. Collectively, DinoDental provides a systematic framework for comprehensively evaluating DINOv3 in dental analysis, establishing a foundational benchmark to guide efficient and effective model selection and adaptation for the dental AI community.

Computer Vision Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

Related Papers