MilaConcordia UniversityKUUniversite Claude Bernard LyonMay 28, 2026arXiv:2605.29927

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xingyu Lu, Leila Kosseim

AI Summary

This paper introduces PlanAhead, a static planner-executor framework, to empirically evaluate the impact of different natural language plan representations (sequential subgoals, narrative, pseudocode, checklist) on the performance of LLM-based web agents. They categorize WebArena tasks into difficulty levels and use Achievement Rate (AR) and Solved-Task Consistency (STC) to evaluate plan representations across different LLMs. The study reveals that both the plan representation and the underlying LLM significantly influence web-agent robustness and task success on hard WebArena tasks.

Key Contribution

How you represent a plan matters more than which LLM you use when building robust web agents.

Abstract

Despite recent advances, LLM-based web agents still struggle with limited exploration, omission of critical steps, and sensitivity to task constraints. Prior work suggests that many of these failures stem from weaknesses in planning, yet the impact of alternative natural language plan representation remains unexplored. To address this, we introduce PlanAhead, a static planner-executor framework that evaluates the impact of plan representation in agent performance. We first automatically categorize WebArena tasks into 3 difficulty levels, enabling consistent difficulty grading without human annotation. Then we systematically evaluate 4 different plan representations on the tasks categorized as hard: sequential subgoals, narrative, pseudocode, and checklist; across different families of multimodal LLM powered agents (OpenAI, Alibaba, and Google). To account for stochastic variability, we introduce two novel evaluation metrics: Achievement Rate (AR) and Solved-Task Consistency (STC). Our results show that both, the plan formulation and the underlying LLM generating the plan, significantly influence web-agent robustness and task success.

Reasoning & Chain-of-Thought Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Related Papers