Robin Young

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

RLHF & Preference Learning (3)Scalable Oversight & Alignment Theory (3)Constitutional AI & AI Ethics (2)

Papers (3)

Mar 5, 2026

Robin Young1w ago

Why Is RLHF Alignment Shallow? A Gradient Analysis

RLHF's reliance on gradient-based alignment inherently limits its depth, causing it to focus on early tokens and neglect later, potentially harmful, contextual dependencies.

Robin Young

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Robin Young1w ago

Knowledge Divergence and the Value of Debate for Scalable Oversight

Debate between AI models hits a phase transition: it's useless when they know the same things, but becomes essential as their knowledge diverges.

Robin Young

RLHF & Preference Learning Scalable Oversight & Alignment Theory

Mar 3, 2026

Robin Young1w ago

Why Does RLAIF Work At All?

RLAIF's apparent magic comes from constitutional prompts acting as a projection operator, selectively activating pre-encoded human values within the model's representation space.

Robin Young

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Search

Robin Young

Publication activitypapers/week, last 8 weeks

Research focus

Papers (3)