Search papers, labs, and topics across Lattice.
1
11
2
AlphaPO unlocks 7-50% better LLM alignment by showing that reward function *shape* is a surprisingly powerful lever in Direct Alignment Algorithms.