Mar 15, 2026arXiv:2603.14248

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

AI Summary

This paper introduces a hierarchical planning framework to analyze failure modes in LLM-based web agents across high-level planning, low-level execution, and replanning stages. The authors compare agents using PDDL plans versus natural language plans, finding that PDDL leads to more concise strategies. However, the dominant bottleneck is low-level execution, suggesting that improvements in perceptual grounding are crucial for enhancing web agent reliability.

Key Contribution

LLM web agents struggle more with perceptual grounding and low-level execution than high-level reasoning, challenging the assumption that better reasoning alone will solve web navigation.

Abstract

Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze web agents across three layers (i.e., high-level planning, low-level execution, and replanning), enabling process-based evaluation of reasoning, grounding, and recovery. Our experiments show that structured Planning Domain Definition Language (PDDL) plans produce more concise and goal-directed strategies than natural language (NL) plans, but low-level execution remains the dominant bottleneck. These results indicate that improving perceptual grounding and adaptive control, not only high-level reasoning, is critical for achieving human-level reliability. This hierarchical perspective provides a principled foundation for diagnosing and advancing LLM web agents.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Related Papers