Search papers, labs, and topics across Lattice.
This paper identifies and addresses "target-domain astigmatism" in cross-domain few-shot object detection (CD-FSOD), where models exhibit dispersed attention in target domains. To remedy this, they propose a center-periphery attention refinement framework inspired by the human fovea-style visual system, comprising positive pattern refinement, negative context modulation, and textual semantic alignment. Experiments on six CD-FSOD benchmarks demonstrate state-of-the-art detection accuracy, showing the effectiveness of focused attention patterns for domain adaptation.
Object detectors in new visual domains suffer from "astigmatism," but mimicking the human eye's foveal vision can bring them into focus.
Cross-domain few-shot object detection (CD-FSOD) aims to adapt pretrained detectors from a source domain to target domains with limited annotations, suffering from severe domain shifts and data scarcity problems. In this work, we find a previously overlooked phenomenon: models exhibit dispersed and unfocused attention in target domains, leading to imprecise localization and redundant predictions, just like a human cannot focus on visual objects. Therefore, we call it the target-domain Astigmatism problem. Analysis on attention distances across transformer layers reveals that regular fine-tuning inherently shows a trend to remedy this problem, but results are still far from satisfactory, which we aim to enhance in this paper. Biologically inspired by the human fovea-style visual system, we enhance the fine-tuning's inherent trend through a center-periphery attention refinement framework, which contains (1) a Positive Pattern Refinement module to reshape attention toward semantic objects using class-specific prototypes, simulating the visual center region; (2) a Negative Context Modulation module to enhance boundary discrimination by modeling background context, simulating the visual periphery region; and (3) a Textual Semantic Alignment module to strengthen center-periphery distinction through cross-modal cues. Our bio-inspired approach transforms astigmatic attention into focused patterns, substantially improving adaptation to target domains. Experiments on six challenging CD-FSOD benchmarks consistently demonstrate improved detection accuracy and establish new state-of-the-art results.