Search papers, labs, and topics across Lattice.
This paper introduces RT-SDGOD, a novel framework for real-time single-domain generalized object detection that addresses the challenges posed by distribution shifts in real-world environments. By leveraging training-time representation learning and a multi-evidence collaborative modeling approach, RT-SDGDet enhances the stability and coverage of object-level evidence without incurring additional inference costs. Experimental results demonstrate that RT-SDGOD significantly outperforms existing methods in generalization across various unseen target domains, highlighting its effectiveness in real-time applications.
Real-time object detectors can achieve cross-domain generalization without any extra inference overhead by leveraging collaborative evidence modeling during training.
In real-world deployment under strict real-time constraints, weather and imaging variations induce significant distribution shifts, severely degrading detectors. Single-Domain Generalized Object Detection aims to mitigate this issue, yet existing methods rarely investigate-at the level of problem formulation-the generalization capability of real-time detectors under such constrained inference budgets. To this end, we introduce Real-Time Single-Domain Generalized Object Detection (RT-SDGOD), which focuses on how real-time detectors can achieve cross-domain generalization under zero extra inference overhead by relying solely on training-time representation learning. We observe that, under domain shift, DETR-based real-time detectors mainly degrade through increased missed detections, rooted in limited and unstable object-level discriminative evidence. Based on this, we propose RT-SDGDet, a multi-evidence collaborative modeling framework for RT-SDGOD. The core idea is to enable multiple queries of the same object to collaboratively cover more sufficient discriminative evidence while maintaining the stability of such evidence modeling across views. Specifically, we use one-to-many (O2M) supervision to construct stable object-specific query groups, and further design Discriminative Evidence Diversity Learning (DEDL) and Dual-view Evidence Consistency Learning (DvECL) to expand object-level evidence coverage and improve evidence stability under appearance perturbations, respectively. Since all components are introduced only during training, our method incurs no extra inference overhead. Extensive experiments show that the proposed method achieves better generalization performance than existing approaches across multiple unseen target domains.