Search papers, labs, and topics across Lattice.
WebXSkill addresses the limitations of existing skill formulations for autonomous web agents by introducing executable skills that combine parameterized action programs with step-level natural language guidance. This framework extracts reusable action subsequences from synthetic agent trajectories, organizes them into a URL-based graph for context-aware retrieval, and deploys them in grounded (automated) or guided (step-by-step instruction) modes. Experiments on WebArena and WebVoyager show that WebXSkill improves task success rates by up to 9.8 and 12.9 points, respectively, compared to baseline methods.
Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.
Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directly executed, while code-based skills are executable but opaque to the agent, offering no step-level understanding for error recovery or adaptation. We introduce WebXSkill, a framework that bridges this gap with executable skills, each pairing a parameterized action program with step-level natural language guidance, enabling both direct execution and agent-driven adaptation. WebXSkill operates in three stages: skill extraction mines reusable action subsequences from readily available synthetic agent trajectories and abstracts them into parameterized skills, skill organization indexes skills into a URL-based graph for context-aware retrieval, and skill deployment exposes two complementary modes, grounded mode for fully automated multi-step execution and guided mode where skills serve as step-by-step instructions that the agent follows with its native planning. On WebArena and WebVoyager, WebXSkill improves task success rate by up to 9.8 and 12.9 points over the baseline, respectively, demonstrating the effectiveness of executable skills for web agents. The code is publicly available at https://github.com/aiming-lab/WebXSkill.