Yang Gao

T+MAP(Mi,FiI),\tilde{F}_{i}^{T}=\tilde{F}_{i-1}^{T}+\textit{MAP}(M_{i},F_{i}^{I}), (3) where MAP denotes the masked average pooling. By cascading this process from deep to shallow layers, CSP progressively expands object-consistent activations while suppressing background noise. Guided by the strong prior from SIA, the refinement jointly optimizes visual consistency and scoring reliability, yielding more precise and robust localization. Fig. 3 illustrates the effectiveness of this iterative process. To achieve an optimal balance between accuracy and efficiency, we set the number of iteration to 33. Table 1: Comparison with OVD models, MLLMs, and RPNs. The best results are highlighted in bold. AR100/300/

Papers on Lattice

Total citations

Topics

h-index