Search papers, labs, and topics across Lattice.
1
0
3
Human and AI feedback in RLHF are surprisingly susceptible to "choice blindness," where manipulated preferences often go unnoticed, undermining the reliability of alignment signals.