Search papers, labs, and topics across Lattice.
This paper investigates the use of common random numbers (CRN) within simulation-based planning with rollouts to reduce variance in utility estimates. They derive a method for provably reducing variance in relative utility when simulations use a rollout policy beyond a certain depth. Empirical results on synthetic tasks, a pension-disbursement task, and the game of Ludo demonstrate improved task performance when using CRN.
Aligning random seeds across rollout simulations can significantly boost the performance of simulation-based planning, even in complex environments like Ludo.
Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.