Search papers, labs, and topics across Lattice.
4
0
4
0
Guaranteeing zero unsafe state visits during RL training is now possible, opening the door to deploying RL agents in previously inaccessible high-risk environments.
Q-learning converges faster than previously thought, thanks to a tighter bound derived from a novel stochastic switching system representation of the Bellman error.
Aligning your Bellman residual minimization objective with the Bellman operator's contraction geometry provably improves performance in MDPs.
Stackelberg Q-learning finally gets finite-time guarantees, opening the door to more reliable multi-agent RL in complex, general-sum games.