Search papers, labs, and topics across Lattice.
This paper introduces a novel three-stage encoder with hierarchical feature representation for tooth image segmentation to address the limitations of fixed-resolution feature maps and the computational cost of transformer-based self-attention. The encoder captures scale-adaptive information and fuses cross-scale features to preserve fine structural information and contextual awareness. By incorporating a bidirectional sequence modeling strategy, the model enhances global spatial context understanding, achieving a 1.1% mIoU improvement on the OralVision dataset.
Ditch the quadratic complexity of transformers for high-resolution dental images: this new encoder uses bidirectional sequence modeling to enhance global spatial context understanding without the computational cost.
Tooth image segmentation is a cornerstone of dental digitization. However, traditional image encoders relying on fixed-resolution feature maps often lead to discontinuous segmentation and poor discrimination between target regions and background, due to insufficient modeling of environmental and global context. Moreover, transformer-based self-attention introduces substantial computational overhead because of its quadratic complexity (O(n^2)), making it inefficient for high-resolution dental images. To address these challenges, we introduce a three-stage encoder with hierarchical feature representation to capture scale-adaptive information in dental images. By jointly leveraging low-level details and high-level semantics through cross-scale feature fusion, the model effectively preserves fine structural information while maintaining strong contextual awareness. Furthermore, a bidirectional sequence modeling strategy is incorporated to enhance global spatial context understanding without incurring high computational cost. We validate our method on two dental datasets, with experimental results demonstrating its superiority over existing approaches. On the OralVision dataset, our model achieves a 1.1% improvement in mean intersection over union (mIoU).