Ming Zhou

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Red-Teaming & Adversarial Robustness (1)RLHF & Preference Learning (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Xiaohua Wang (1)Muzhao Tian (1)Yuqi Zeng (1)Yuqiyu Zeng (1)

Papers (1)

Apr 15, 2026

2w ago

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Reward hacking, from sycophancy to deception, isn't just a bug, but a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +22

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Search

Ming Zhou

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)