School of Artificial IntelligenceShenzhen University of Advanced TechnologyTJUJun 8, 2026arXiv:2606.09360

ExDet: Open-Domain Open-Vocabulary Detection with Cross-modal Extrapolation and Rectification

Yupeng Zhang, Yuzhong Feng, Ruize Han, Zhiwei Chen, Wei Feng, Liang Wan

AI Summary

This paper introduces ExDet, a novel framework for open-domain open-vocabulary detection (ODOVD) that enhances the generalization capabilities of existing detectors to novel categories and unseen domains. By leveraging Text-Guided Extrapolation (TGE) to create category- and domain-aware visual prototypes and employing a Detector-Compatible Rectification (DCR) module, ExDet significantly reduces training costs while improving classification accuracy for challenging detection tasks. The framework achieves state-of-the-art performance across multiple benchmark datasets, demonstrating its effectiveness in addressing the complexities of ODOVD.

Key Contribution

ExDet achieves state-of-the-art performance in open-domain open-vocabulary detection while significantly reducing training costs through innovative cross-modal techniques.

Abstract

Open-domain open-vocabulary detection (ODOVD) requires detectors to generalize to both novel categories and unseen domains, making it more challenging than open-vocabulary detection. Existing methods typically train open-vocabulary detectors together with domain generalization modules from scratch, leading to high training cost. we propose ExDet, a lightweight category-domain collaborative generalization framework for ODOVD that enhances the cross-category and cross-domain generalization of existing detectors. ExDet consists of Text-Guided Extrapolation (TGE), a lightweight Detector-Compatible Rectification (DCR) module, and ExRPN. Specifically, TGE exploits the DeltaSpace property of vision-language models (VLMs) to infer category- and domain-aware proxy visual prototypes from text. DCR is learned from the TGE-generated prototypes in a detector training-free and real-data-free manner, and is inserted after the classification head at inference to rectify representations toward a detector-compatible source-domain visual distribution, thereby enhancing classification for targets from novel categories and unseen domains. ExRPN recalibrates proposal scores by combining semantic similarity with RPN confidence, improving recall for novel and domain-shifted objects while providing better support for subsequent classification and DCR. ExDet achieves SOTA performance on OD-LVIS, OV-LVIS, Objects365, and MSOSB.

Computer Vision Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ExDet: Open-Domain Open-Vocabulary Detection with Cross-modal Extrapolation and Rectification

Related Papers