Search papers, labs, and topics across Lattice.
The paper introduces D-FINE-seg, an instance segmentation extension of the D-FINE object detector, incorporating a lightweight mask head, segmentation-aware training with specialized loss functions, and denoising mask supervision. D-FINE-seg achieves improved F1-score compared to YOLOv8 on the TACO dataset under a unified TensorRT FP16 benchmarking protocol, while maintaining competitive latency. The authors also provide an end-to-end open-source pipeline for training, exporting, and optimized inference across ONNX, TensorRT, and OpenVINO for both object detection and instance segmentation.
D-FINE-seg dethrones YOLO26 on the TACO dataset for instance segmentation, proving transformer-based detectors can achieve superior accuracy-latency trade-offs.
Transformer-based real-time object detectors achieve strong accuracy-latency trade-offs, and D-FINE is among the top-performing recent architectures. However, real-time instance segmentation with transformers is still less common. We present D-FINE-seg, an instance segmentation extension of D-FINE that adds: a lightweight mask head, segmentation-aware training, including box cropped BCE and dice mask losses, auxiliary and denoising mask supervision, and adapted Hungarian matching cost. On the TACO dataset, D-FINE-seg improves F1-score over Ultralytics YOLO26 under a unified TensorRT FP16 end-to-end benchmarking protocol, while maintaining competitive latency. Second contribution is an end-to-end pipeline for training, exporting, and optimized inference across ONNX, TensorRT, OpenVINO for both object detection and instance segmentation tasks. This framework is released as open-source under the Apache-2.0 license. GitHub repository - https://github.com/ArgoHA/D-FINE-seg.