Search papers, labs, and topics across Lattice.
The paper introduces Persona2Web, a new benchmark designed to evaluate the ability of web agents to personalize their responses based on implicit user preferences inferred from long-term user history. This benchmark addresses the limitation of current web agents that struggle with ambiguous queries requiring contextual reasoning beyond explicit instructions. Experiments using Persona2Web across different agent architectures and backbone models reveal significant challenges in achieving effective personalization in web agents.
Current web agents struggle to infer user preferences from history, highlighting a critical gap in personalization that Persona2Web is designed to address.
Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes, and queries with varying ambiguity levels, revealing key challenges in personalized web agent behavior. For reproducibility, our codes and datasets are publicly available at https://anonymous.4open.science/r/Persona2Web-73E8.