Mar 30, 2026arXiv:2603.28074

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

AI Summary

This paper explores using Koopman-based Linear Recurrent Autoencoder Networks (LRANs) as surrogate models to accelerate reinforcement learning (RL) control of 2D Rayleigh-Bénard convection, a computationally expensive fluid dynamics problem. They compare two training strategies: a surrogate trained on precomputed data and a policy-aware surrogate trained iteratively with data from an evolving policy. The key result is that combining surrogate pretraining with direct numerical simulations (DNS) achieves state-of-the-art performance while reducing training time by over 40%, and policy-aware training further improves surrogate accuracy by mitigating distribution shift.

Key Contribution

RL agents can learn to control complex fluid dynamics 40% faster by pretraining on Koopman-based surrogate models and iteratively refining them with policy-aware data.

Abstract

Training reinforcement learning (RL) agents to control fluid dynamics systems is computationally expensive due to the high cost of direct numerical simulations (DNS) of the governing equations. Surrogate models offer a promising alternative by approximating the dynamics at a fraction of the computational cost, but their feasibility as training environments for RL is limited by distribution shifts, as policies induce state distributions not covered by the surrogate training data. In this work, we investigate the use of Linear Recurrent Autoencoder Networks (LRANs) for accelerating RL-based control of 2D Rayleigh-Bénard convection. We evaluate two training strategies: a surrogate trained on precomputed data generated with random actions, and a policy-aware surrogate trained iteratively using data collected from an evolving policy. Our results show that while surrogate-only training leads to reduced control performance, combining surrogates with DNS in a pretraining scheme recovers state-of-the-art performance while reducing training time by more than 40%. We demonstrate that policy-aware training mitigates the effects of distribution shift, enabling more accurate predictions in policy-relevant regions of the state space.

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

Related Papers