Apr 13, 2026arXiv:2604.11386

ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation

Yiran Qin, Jiahua Ma, Li Kang, Wenzhan Li, Yihan Jiao, Yihang Jiao, Xin Wen, Xiufeng Song, Heng Zhou, Jiwen Yu, Zhenfei Yin, Xihui Liu, Philip Torr, Philip H. S. Torr, Yilun Du, Ruimao Zhang

AI Summary

This paper introduces Compositional Simulation (ComSim), a hybrid simulation approach combining classical and neural simulation to generate realistic action-video pairs for robot training. ComSim uses a closed-loop real-sim-real data augmentation pipeline, leveraging a small real-world dataset to train a neural simulator that transforms classical simulation videos into more realistic representations. Experiments show that ComSim significantly reduces the sim2real gap, leading to improved real-world policy performance.

Key Contribution

Generating robot training data that bridges the sim2real gap doesn't require painstakingly detailed simulation environments; instead, a neural simulator can transform classical simulations into realistic representations using only a small amount of real-world data.

Abstract

Recent advancements in foundational models, such as large language models and world models, have greatly enhanced the capabilities of robotics, enabling robots to autonomously perform complex tasks. However, acquiring large-scale, high-quality training data for robotics remains a challenge, as it often requires substantial manual effort and is limited in its coverage of diverse real-world environments. To address this, we propose a novel hybrid approach called Compositional Simulation, which combines classical simulation and neural simulation to generate accurate action-video pairs while maintaining real-world consistency. Our approach utilizes a closed-loop real-sim-real data augmentation pipeline, leveraging a small amount of real-world data to generate diverse, large-scale training datasets that cover a broader spectrum of real-world scenarios. We train a neural simulator to transform classical simulation videos into real-world representations, improving the accuracy of policy models trained in real-world environments. Through extensive experiments, we demonstrate that our method significantly reduces the sim2real domain gap, resulting in higher success rates in real-world policy model training. Our approach offers a scalable solution for generating robust training data and bridging the gap between simulated and real-world robotics.

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations1

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation

Related Papers