Search papers, labs, and topics across Lattice.
UC Davis
1
0
2
PRIME reveals a crucial precursor to reward hacking that can predict and adapt to misalignment before it manifests, offering a new lens on alignment risks in RL systems.