Search papers, labs, and topics across Lattice.
1
2
3
4
Forget RLHF's quirks: aligning LLMs is fundamentally a distribution learning problem, and preference distillation offers a theoretically sound and empirically strong alternative.