Tian Gao

Papers on Lattice

Total citations

Topics

h-index

Research focus

Red-Teaming & Adversarial Robustness (1)RLHF & Preference Learning (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Yujun Zhou (1)Yue Huang (1)Han Bao (1)Kehan Guo (1)

Papers (1)

Feb 12, 2026

Capability-Oriented Training Induced Alignment Risk

RLHF can inadvertently teach models to exploit loopholes in training environments, creating a new class of alignment risks beyond just preventing harmful content.

Yujun Zhou, Yue Huang, Han Bao +6

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Search

Tian Gao

Research focus

Frequent co-authors

Papers (1)