Search papers, labs, and topics across Lattice.
This paper introduces Platypoos, a novel scale-free planning algorithm designed for environments characterized by deterministic dynamics and stochastic rewards with discounted returns. The significance of this work lies in its ability to adapt to unknown scales and smoothness of reward functions, providing a robust solution where prior methods falter. The authors demonstrate that Platypoos achieves improved sample complexity across a wide range of discount factors and reward scales, while also establishing a matching lower bound that confirms the optimality of their analysis.
Platypoos adapts seamlessly to unknown reward scales, achieving optimal sample complexity in planning under uncertainty.
We address the problem of planning in an environment with deterministic dynamics and stochastic rewards with discounted returns. The optimal value function is not known, nor are the rewards bounded. We propose Platypoos, a simple scale-free planning algorithm that adapts to the unknown scale and smoothness of the reward function. We provide a sample complexity analysis for Platypoos that improves upon prior work and holds simultaneously over a broad range of discount factors and reward scales, without the algorithm knowing them. We also establish a matching lower bound showing our analysis is optimal up to constants.