Search papers, labs, and topics across Lattice.
The paper introduces WebChain, a large-scale dataset of 31,725 human-annotated web interaction trajectories (318k steps) with visual, structural, and action data. A scalable data collection pipeline ensures coverage of complex, real-world tasks. Using WebChain, the authors propose a Dual Mid-Training recipe that separates spatial grounding from planning, achieving SOTA results on WebChainBench and other GUI benchmarks.
Forget synthetic data: WebChain offers the largest open-source dataset of real-world, human-annotated web interaction trajectories, unlocking a new level of realism for training web agents.
We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectories and 318k steps, featuring a core Triple Alignment of visual, structural, and action data to provide rich, multi-modal supervision. The data is collected via a scalable pipeline that ensures coverage of complex, high-value tasks often missed by synthetic methods. Leveraging this dataset, we propose a Dual Mid-Training recipe that decouples spatial grounding from planning, achieving state-of-the-art performance on our proposed WebChainBench and other public GUI benchmarks. Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents.