Search papers, labs, and topics across Lattice.
Indian Institute of Technology Roorkee, Lossfunk {shivank_g@mfs, ayush_s@mt, shweta_s@mfs}.iitr.ac.in, paras@lossfunk.com Abstract 00footnotetext: *Equal contribution.
1
12
3
2
Ditch the expensive reward model: your LLM already knows what it likes, and IPO shows you how to use that for preference optimization.