Search papers, labs, and topics across Lattice.
1
0
2
Stop hand-tuning reward learning losses: MAVRL learns a shared reward function from diverse feedback signals by treating each as a likelihood within a Bayesian inference framework.