Search papers, labs, and topics across Lattice.
1
0
3
Mitigate the brittleness of RLHF by explicitly controlling for disagreement and tail risk during inference, without retraining, using a KL-robust optimization framework.