Search papers, labs, and topics across Lattice.
2
0
3
Decomposing Bellman values into a graph of simpler objectives lets agents master complex, high-dimensional tasks with less tuning and better safety.
Forget hand-engineering initial conditions for robust RL: this method *learns* which conditions are feasible while simultaneously training a safe policy.