Feb 23, 2026arXiv:2602.20117

ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models

Andre He, Nathaniel Weir, Kaj Bostrom, Allen Nie, Darion Cassel, Sam Bayless, Huzefa Rangwala

AI Summary

The paper introduces ReSyn, a pipeline for autonomously generating diverse reasoning environments with instance generators and verifiers to scale Reinforcement Learning with Verifiable Rewards (RLVR) for training reasoning language models (RLMs). This approach addresses the limitations of existing methods that are either solution-centric or rely on limited hand-crafted environments. Training a Qwen2.5-7B-Instruct model with RL on ReSyn data resulted in significant performance gains across reasoning and math benchmarks, including a 27% relative improvement on the BBEH benchmark.

Key Contribution

Scaling synthetic environments with automatically generated tasks and verifiers unlocks significant reasoning improvements in language models, achieving a 27% relative gain on BBEH.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs) by leveraging supervision from verifiers. Although verifier implementation is easier than solution annotation for many tasks, existing synthetic data generation methods remain largely solution-centric, while verifier-based methods rely on a few hand-crafted procedural environments. In this work, we scale RLVR by introducing ReSyn, a pipeline that generates diverse reasoning environments equipped with instance generators and verifiers, covering tasks such as constraint satisfaction, algorithmic puzzles, and spatial reasoning. A Qwen2.5-7B-Instruct model trained with RL on ReSyn data achieves consistent gains across reasoning benchmarks and out-of-domain math benchmarks, including a 27\% relative improvement on the challenging BBEH benchmark. Ablations show that verifier-based supervision and increased task diversity both contribute significantly, providing empirical evidence that generating reasoning environments at scale can enhance reasoning abilities in RLMs

Data Curation & Synthetic Data Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models

Related Papers