Wenbin Wu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Constitutional AI & AI Ethics (1)Eval Frameworks & Benchmarks (1)RLHF & Preference Learning (1)

Papers (1)

Mar 9, 2026

Wenbin Wu1w ago

Aligning to Illusions: Choice Blindness in Human and AI Feedback

Human and AI feedback in RLHF are surprisingly susceptible to "choice blindness," where manipulated preferences often go unnoticed, undermining the reliability of alignment signals.

Wenbin Wu

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Search

Wenbin Wu

Publication activitypapers/week, last 8 weeks

Research focus

Papers (1)