Search papers, labs, and topics across Lattice.
D3-Gym, a new dataset for data-driven discovery, provides 565 verifiable environments extracted from 239 real scientific repositories across four disciplines, each with natural language instructions, executable environments, datasets, reference solutions, and synthesized evaluation scripts. The synthesized evaluation scripts demonstrate 87.5% agreement with human-annotated gold standards, confirming their scientific soundness. Training Qwen3 models on trajectories sampled from D3-Gym significantly improves performance on ScienceAgentBench, closing the gap with proprietary models.
Training on D3-Gym, a new dataset of real-world scientific environments, boosts Qwen3-32B performance on ScienceAgentBench by 7.8 points, rivaling proprietary models.
Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing real-world scientific tasks.To fill this gap, we introduce D3-Gym, the first automatically constructed dataset with verifiable environments for scientific Data-Driven Discovery. D3-Gym comprises (1) 565 tasks sourced from 239 real scientific repositories across four disciplines where (2) each task is equipped with a natural language instruction, an executable environment with pre-installed dependencies, input dataset and artifact previews, a reference code solution, and an automatically synthesized evaluation script. Rigorous evaluation of the quality of the verification signal in D3-Gym confirms that our evaluation scripts achieve 87.5% agreement with human-annotated gold standards and strong alignment in domain-specific evaluation logic, showing their scientific soundness. Further, training on trajectories sampled from D3-Gym yields consistent and substantial gains across Qwen3 models of varying sizes on ScienceAgentBench, boosting Qwen3-32B by 7.8 absolute points and substantially shrinking the gap with strong proprietary models. All D3-Gym artifacts (environments, creation workflow, trajectories, and models) can be found at https://github.com/OSU-NLP-Group/D3-Gym.