MIT CSAILCASCSIROHFUTUniversity of CaliforniaXiamen UniversityXJTUJun 16, 2026arXiv:2606.17871

StepGuard: Guarding Web Navigation via Single-Step Calibration

Zhihao Cui, Yuchen Zhang, Xiyang Sun, Yaxiong Wang, Li Zhu, Jinpeng Hu, Liu Liu, Mengjia Li, Yujiao Wu

AI Summary

This paper introduces StepGuard, a novel framework designed to enhance web navigation by addressing single-step fragility through Dynamic Dual-Policy Optimization (DDPO) and Confidence-Guided Adaptive Navigation Reflection (CANR). By dynamically switching between exploration and question-answering modes, StepGuard effectively mitigates reward misalignment and error propagation, leading to improved navigation and answer accuracy. Experimental results show that StepGuard achieves state-of-the-art performance on standard web navigation benchmarks, highlighting its effectiveness in real-world applications.

Key Contribution

StepGuard's innovative dual-policy approach not only improves navigation accuracy but also recalibrates single-step errors, setting a new benchmark for web navigation tasks.

Abstract

Web navigation requires agents to follow natural language goals, interact with web pages, and produce accurate answers. While recent advances leverage vision-language models and reinforcement learning, existing methods still suffer from single-step fragility due to reward misalignment and error propagation. To tackle the reward entanglement, we design Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between a navigation-first mode for exploration and an answer-first mode for question-answering to mitigate reward conflict. To calibrate the single-step error, we propose Confidence-Guided Adaptive Navigation Reflection (CANR), a mechanism that estimates per-step confidence, triggers reflection only when necessary, and uses contrastive rewards to encourage self-correction to calibrate the single-step inaccuracy. With the above as the main components, we finally develop our StepGuard, a new framework of Guarding Web Navigation via Single-Step Calibration. Experiments demonstrate that our approach significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks.

RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

StepGuard: Guarding Web Navigation via Single-Step Calibration

Related Papers