Search papers, labs, and topics across Lattice.
Vanderbilt University
1
0
3
Attention from your LLM can be used to significantly improve preference optimization, outperforming existing methods without needing a separate reward model or heuristic token weighting.