INRIAParis-SaclayApr 20, 2026arXiv:2604.18312

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter L. Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko

AI Summary

This paper introduces Platypoos, a novel scale-free planning algorithm designed for environments characterized by deterministic dynamics and stochastic rewards with discounted returns. The significance of this work lies in its ability to adapt to unknown scales and smoothness of reward functions, providing a robust solution where prior methods falter. The authors demonstrate that Platypoos achieves improved sample complexity across a wide range of discount factors and reward scales, while also establishing a matching lower bound that confirms the optimality of their analysis.

Key Contribution

Platypoos adapts seamlessly to unknown reward scales, achieving optimal sample complexity in planning under uncertainty.

Abstract

We address the problem of planning in an environment with deterministic dynamics and stochastic rewards with discounted returns. The optimal value function is not known, nor are the rewards bounded. We propose Platypoos, a simple scale-free planning algorithm that adapts to the unknown scale and smoothness of the reward function. We provide a sample complexity analysis for Platypoos that improves upon prior work and holds simultaneously over a broad range of discount factors and reward scales, without the algorithm knowing them. We also establish a matching lower bound showing our analysis is optimal up to constants.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scale-free adaptive planning for deterministic dynamics &amp; discounted rewards

Related Papers

Scale-free adaptive planning for deterministic dynamics & discounted rewards