Search papers, labs, and topics across Lattice.
University of Auckland
1
0
3
8
LLMs can get up to 6x more logically consistent without human feedback, simply by fusing NLI scores into the DPO training loop.