Search papers, labs, and topics across Lattice.
1
0
3
By "imagining" new scenarios and asking "What if this were the true preference?", CRED actively designs environments and trajectories to expose differences between competing reward functions, dramatically improving preference learning.