Search papers, labs, and topics across Lattice.
University of Oxford
2
0
4
Learning dense rewards from expert demonstrations allows for over 90% success in complex manipulation tasks, outperforming traditional RL methods.
GP-PSRL can achieve sublinear regret bounds in continuous control even with unbounded state spaces, resolving prior theoretical limitations and opening the door to more complex RL settings.