Search papers, labs, and topics across Lattice.
This paper introduces a matching-free training scheme for DETR-based object detectors, eliminating the need for the Hungarian algorithm. The core innovation is a Cross-Attention-based Query Selection (CAQS) module that uses encoded ground-truth information to probe decoder queries and learn implicit correspondences. Experiments show the method enhances training efficiency, reduces matching latency by over 50%, and achieves superior performance compared to existing state-of-the-art methods.
Ditch the Hungarian algorithm: this new DETR training scheme slashes matching latency by 50% and boosts performance by learning implicit object correspondences.
Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship further provides supervision signals for the learning of queries. Experimental results demonstrate that our proposed method bypasses the traditional matching process, significantly enhancing training efficiency, reducing the matching latency by over 50\%, effectively eliminating the discrete matching bottleneck through differentiable correspondence learning, and also achieving superior performance compared to existing state-of-the-art methods.