HKUSTPKUTencent AIMay 26, 2026arXiv:2605.26861

REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization

Furong Jia, Dacheng Yin, Kang Rong, Fengyun Rao, Jing Lyu

AI Summary

The paper introduces REVERSE, a reinforcement learning framework for image geo-localization that mimics the iterative reasoning process of human experts by explicitly modeling evidence search and verification. REVERSE learns to select informative image regions, formulate effective search queries, and discriminate between relevant and irrelevant evidence using tool-grounded trajectories and process rewards. Experiments on Im2GPS3k and YFCC4k datasets demonstrate that REVERSE outperforms existing retrieval-augmented methods and rivals much larger models.

Key Contribution

Forget brute-force scaling: REVERSE shows that teaching an agent *how* to search and verify evidence lets a smaller model beat giants at image geo-localization.

Abstract

Image geo-localization aims to determine where a photograph was taken, a task that often requires more than recognizing visible landmarks. Human experts typically solve it through an iterative workflow: they inspect informative regions, form location hypotheses, seek external evidence, and revise their judgments as new clues appear. Existing methods only partially capture this process: direct prediction methods bypass evidence acquisition altogether, while retrieval-augmented methods introduce external evidence but usually provide limited supervision on the intermediate decisions of where to search, how to query, and how to filter noisy results. We present REVERSE, a framework that reinforces the interplay between evidence search and verification to enable multi-turn agentic reasoning. REVERSE teaches three intermediate decisions: where to look, what to query, and what evidence to trust. To support this, we construct tool-grounded trajectories with annotated region selections, search observations, and geo-informative evidence labels, and introduce process rewards for visual grounding, query utility, and evidence discrimination. An offline search cache makes retrieval observations stable and reusable during reinforcement learning, enabling dense supervision over noisy search results. With a 4B model, REVERSE outperforms strong retrieval-augmented baselines and rivals substantially larger models on Im2GPS3k and YFCC4k. Code is available at https://github.com/yonglleee/REVERSE.

Computer Vision Multimodal Models Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...