Search papers, labs, and topics across Lattice.
3
0
3
1
RLHF's reliance on gradient-based alignment inherently limits its depth, causing it to focus on early tokens and neglect later, potentially harmful, contextual dependencies.
Debate between AI models hits a phase transition: it's useless when they know the same things, but becomes essential as their knowledge diverges.
RLAIF's apparent magic comes from constitutional prompts acting as a projection operator, selectively activating pre-encoded human values within the model's representation space.