Search papers, labs, and topics across Lattice.
This paper investigates the impact of Markov Decision Process (MDP) design choices on sim-to-real transfer in reinforcement learning for industrial process control. They systematically evaluate different MDP configurations, including state composition, reward formulation, and dynamics models, in a color mixing task. The key finding is that physics-based dynamics models significantly improve real-world success rates (up to 50%) compared to simplified models under strict precision constraints.
Physics-based dynamics models can make or break sim-to-real reinforcement learning, boosting real-world success by 50% in industrial control tasks where simplified models fail.
Reinforcement Learning (RL) has demonstrated strong potential for industrial process control, yet policies trained in simulation often suffer from a significant sim-to-real gap when deployed on physical hardware. This work systematically analyzes how core Markov Decision Process (MDP) design choices -- state composition, target inclusion, reward formulation, termination criteria, and environment dynamics models -- affect this transfer. Using a color mixing task, we evaluate different MDP configurations and mixing dynamics across simulation and real-world experiments. We validate our findings on physical hardware, demonstrating that physics-based dynamics models achieve up to 50% real-world success under strict precision constraints where simplified models fail entirely. Our results provide practical MDP design guidelines for deploying RL in industrial process control.