Microsoft ResearchVirginia TechApr 14, 2026arXiv:2604.13318

WebXSkill: Skill Learning for Autonomous Web Agents

Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Wenlin Yao, Fazle Faisal, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Suman Nath, Qingwei Lin, Chetan Bansal, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Saravan Rajmohan, Huaxiu Yao

AI Summary

WebXSkill addresses the limitations of existing skill formulations for autonomous web agents by introducing executable skills that combine parameterized action programs with step-level natural language guidance. This framework extracts reusable action subsequences from synthetic agent trajectories, organizes them into a URL-based graph for context-aware retrieval, and deploys them in grounded (automated) or guided (step-by-step instruction) modes. Experiments on WebArena and WebVoyager show that WebXSkill improves task success rates by up to 9.8 and 12.9 points, respectively, compared to baseline methods.

Key Contribution

Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.

Abstract

Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directly executed, while code-based skills are executable but opaque to the agent, offering no step-level understanding for error recovery or adaptation. We introduce WebXSkill, a framework that bridges this gap with executable skills, each pairing a parameterized action program with step-level natural language guidance, enabling both direct execution and agent-driven adaptation. WebXSkill operates in three stages: skill extraction mines reusable action subsequences from readily available synthetic agent trajectories and abstracts them into parameterized skills, skill organization indexes skills into a URL-based graph for context-aware retrieval, and skill deployment exposes two complementary modes, grounded mode for fully automated multi-step execution and guided mode where skills serve as step-by-step instructions that the agent follows with its native planning. On WebArena and WebVoyager, WebXSkill improves task success rate by up to 9.8 and 12.9 points over the baseline, respectively, demonstrating the effectiveness of executable skills for web agents. The code is publicly available at https://github.com/aiming-lab/WebXSkill.

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

WebXSkill: Skill Learning for Autonomous Web Agents

Related Papers