University of SurreyMay 6, 2026arXiv:2605.04531

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma, Shuaifeng Li

AI Summary

The paper introduces Reward-Guided Semantic Evolution (RGSE), a training-free test-time adaptation method for open-vocabulary object detection that addresses semantic misalignment in VLMs under distribution shift. RGSE refines text embeddings by perturbing them, evaluating the perturbations using cosine similarity with high-confidence visual proposals, and fusing them via reward-weighted averaging. Experiments show RGSE achieves state-of-the-art performance on multiple detection benchmarks with minimal overhead, without requiring backpropagation.

Key Contribution

Forget training, just nudge your text embeddings: RGSE closes the open-vocabulary object detection gap under distribution shift by directly and efficiently adapting text embeddings at test time.

Abstract

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

Related Papers