Search papers, labs, and topics across Lattice.
This paper introduces Boxes2Pixels, a noise-robust distillation framework for defect segmentation that leverages noisy pseudo-masks generated by SAM from bounding box annotations. The framework employs a hierarchical decoder over frozen DINOv2 features, an auxiliary binary localization head, and a one-sided online self-correction mechanism to mitigate SAM's tendency to hallucinate background structures and miss sparse defects. Experiments on a wind turbine inspection benchmark demonstrate that Boxes2Pixels significantly improves anomaly and binary IoU compared to baseline methods trained with the same weak supervision, while also reducing the number of trainable parameters.
You can get surprisingly accurate defect segmentation from noisy SAM pseudo-masks by treating SAM as a flawed teacher and correcting its mistakes during training.
Accurate defect segmentation is critical for industrial inspection, yet dense pixel-level annotations are rarely available. A common workaround is to convert inexpensive bounding boxes into pseudo-masks using foundation segmentation models such as the Segment Anything Model (SAM). However, these pseudo-labels are systematically noisy on industrial surfaces, often hallucinating background structure while missing sparse defects. To address this limitation, a noise-robust box-to-pixel distillation framework, Boxes2Pixels, is proposed that treats SAM as a noisy teacher rather than a source of ground-truth supervision. Bounding boxes are converted into pseudo-masks offline by SAM, and a compact student is trained with (i) a hierarchical decoder over frozen DINOv2 features for semantic stability, (ii) an auxiliary binary localization head to decouple sparse foreground discovery from class prediction, and (iii) a one-sided online self-correction mechanism that relaxes background supervision when the student is confident, targeting teacher false negatives. On a manually annotated wind turbine inspection benchmark, the proposed Boxes2Pixels improves anomaly mIoU by +6.97 and binary IoU by +9.71 over the strongest baseline trained under identical weak supervision. Moreover, online self-correction increases the binary recall by +18.56, while the model employs 80\% fewer trainable parameters. Code is available at https://github.com/CLendering/Boxes2Pixels.