Search papers, labs, and topics across Lattice.
1
0
2
Preference optimization objectives, despite their diversity, can be steered towards disentangled dynamics that avoid suppressing the chosen response alongside the rejected one, simply by satisfying a "disentanglement band" condition.