Search papers, labs, and topics across Lattice.
Princeton University
1
0
3
Imperfect rewards can actually *help* policy gradient methods escape local optima, challenging the conventional wisdom that reward accuracy is always paramount.