Search papers, labs, and topics across Lattice.
1
1
0
0
RLHF's implicit preference aggregation can now be explicitly controlled via differentiable loss functions corresponding to different voting rules, enabling principled trade-offs between axiomatic guarantees and optimization stability.