Search papers, labs, and topics across Lattice.
This paper synthesizes insights from over 150 studies on post-training reasoning data, highlighting its critical role in enhancing large reasoning models. By categorizing existing literature around four key questions鈥攄ata objects, their utility, construction methods, and scalability鈥攖he authors create a structured framework that can guide future research and development in this area. The findings underscore the importance of reasoning data in the post-training phase, offering a comprehensive overview that can inform best practices and innovations in model enhancement.
The organization of post-training reasoning data into a cohesive framework reveals crucial insights that could accelerate advancements in large reasoning models.
Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recipes, reward-model studies, benchmarks, and frontier system reports. This paper is the first primer to synthesize over 150 key public studies and system reports on post-training reasoning data. We organize the field around four questions: what data objects exist, what makes them useful, how they are constructed, and how they scale. Together, this organization provides an attribution framework for future reasoning-data releases and post-training recipes.