Search papers, labs, and topics across Lattice.
1
0
7
Ditch KL divergence in RLHF: Wasserstein Policy Regularization uses token geometry to align LLMs better with human preferences.