Search papers, labs, and topics across Lattice.
Singapore Management University
1
0
2
33
Forget reward and cost models: PreSa directly learns safe policies from offline preferences and safety labels, outperforming traditional constrained RL approaches.