DeepMindINRIAParis-SaclaySequeL teamApr 21, 2026arXiv:2604.19695

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

AI Summary

This paper introduces SmoothCruiser, a planning algorithm for entropy-regularized Markov Decision Processes (MDPs) and two-player games that leverages the smoothness induced by entropy regularization. By exploiting this smoothness, SmoothCruiser achieves a problem-independent sample complexity of O~(1/epsilon^4) for estimating the value function with accuracy epsilon. This is a significant improvement, as no polynomial sample complexity guarantees exist for non-regularized MDPs in the worst case.

Key Contribution

Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Planning in entropy-regularized Markov decision processes and games

Related Papers