Mar 8, 2026arXiv:2603.07844

Relating Reinforcement Learning to Dynamic Programming-Based Planning

Filip V. Georgiev, Kalle G. Timperi, Başak Sakçak, Steven M. LaValle

AI Summary

This paper investigates the relationship between dynamic programming-based planning algorithms like value iteration and Dijkstra's algorithm and reinforcement learning (RL) methods. It establishes conditions for the equivalence of cost minimization and reward maximization, single-shot goal termination and infinite-horizon episodic learning, and identifies scenarios where discounting hinders goal achievement. The authors introduce a derandomized RL variant for performance comparisons and advocate for optimizing true cost functions instead of relying on arbitrary parameters.

Key Contribution

Discounting can actively prevent goal achievement in reinforcement learning, challenging the common practice of using it for convergence.

Abstract

This paper bridges some of the gap between optimal planning and reinforcement learning (RL), both of which share roots in dynamic programming applied to sequential decision making or optimal control. Whereas planning typically favors deterministic models, goal termination, and cost minimization, RL tends to favor stochastic models, infinite-horizon discounting, and reward maximization in addition to learning-related parameters such as the learning rate and greediness factor. A derandomized version of RL is developed, analyzed, and implemented to yield performance comparisons with value iteration and Dijkstra's algorithm using simple planning models. Next, mathematical analysis shows: 1) conditions under which cost minimization and reward maximization are equivalent, 2) conditions for equivalence of single-shot goal termination and infinite-horizon episodic learning, and 3) conditions under which discounting causes goal achievement to fail. The paper then advocates for defining and optimizing truecost, rather than inserting arbitrary parameters to guide operations. Performance studies are then extended to the stochastic case, using planning-oriented criteria and comparing value iteration to RL with learning rates and greediness factors.

Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Relating Reinforcement Learning to Dynamic Programming-Based Planning

Related Papers