Search papers, labs, and topics across Lattice.
1
0
2
Even with noisy or misspecified preference feedback, LLMs can be robustly aligned online by penalizing sensitivity to oracle uncertainty.