Search papers, labs, and topics across Lattice.
1
1
3
34
RLHF and DPO are surprisingly vulnerable to data poisoning, with even a small number of carefully crafted preferences capable of steering the learned policy towards a desired (potentially harmful) target.