Search papers, labs, and topics across Lattice.
This paper analyzes a soft Bellman residual minimization (BRM) objective with a weighted Lp-norm for solving Markov Decision Processes (MDPs) under linear function approximation. It demonstrates that increasing *p* in the Lp-norm aligns the BRM objective with the contraction geometry of the Bellman operator, thereby reducing error propagation. The analysis provides performance error bounds that explicitly connect residual minimization with Bellman contraction properties.
Aligning your Bellman residual minimization objective with the Bellman operator's contraction geometry provably improves performance in MDPs.
The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.