Khoi Le

Papers on Lattice

Total citations

Topics

Research focus

Eval Frameworks & Benchmarks (1)RLHF & Preference Learning (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Tri Cao (1)Phong Nguyen (1)Cong-Duy Nguyen (1)Miao Chunyan (1)

Papers (1)

May 25, 2026

NUSMay 25, 2026·also NTU

When In-Distribution Gains Fail: Evaluating Weak-to-Strong Reward Models under Preference Shift

Weak-to-strong reward models can ace the test but still fail in the real world, revealing a hidden brittleness in current preference learning approaches.

Khoi Le, Tri Cao, Phong Nguyen +4

Eval Frameworks & Benchmarks RLHF & Preference Learning Scalable Oversight & Alignment Theory

Search

Khoi Le

Research focus

Frequent co-authors

Papers (1)