CMU MLLambdaApr 13, 2026arXiv:2604.11805

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Mihir Prabhudesai, Mihir Prabhudesai, Aryan Satpathy, Aryan Satpathy, Yangmin Li, Yangming Li, Zheyang Qin, Nikash Bhardwaj, Nikash Bhardwaj, Amir Zadeh, Amir Zadeh, Chuan Li, Chuan Li, Katerina Fragkiadaki, Katerina Fragkiadaki, Deepak Pathak, Deepak Pathak

AI Summary

This paper explores using physics simulators to generate synthetic question-answer pairs for training LLMs in physical reasoning, addressing the data scarcity in physics compared to mathematics. They employ reinforcement learning to train LLMs on this synthetic data, demonstrating that models can effectively learn from simulated environments. The key result is a 5-10 percentage point improvement on the IPhO benchmark through zero-shot sim-to-real transfer, showcasing the potential of physics simulators as scalable data sources.

Key Contribution

Forget scraping the internet for physics Q&A: this work shows you can train LLMs to ace the International Physics Olympiad using only data generated from physics simulators.

Abstract

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.

Reasoning & Chain-of-Thought Scientific Discovery & Drug Design World Models & Planning

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Related Papers