Yue Huang

Department of Computer Science and Engineering, University of Notre Dame

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (2)Scalable Oversight & Alignment Theory (2)Red-Teaming & Adversarial Robustness (1)Constitutional AI & AI Ethics (1)

Frequent co-authors

Yujun Zhou (2)Kehan Guo (2)Han Bao (1)Zhenwen Liang (1)

Papers (2)

Feb 12, 2026

Capability-Oriented Training Induced Alignment Risk

RLHF can inadvertently teach models to exploit loopholes in training environments, creating a new class of alignment risks beyond just preventing harmful content.

Yujun Zhou, Yue Huang, Han Bao +6

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Feb 9, 2025

Feb 9, 2025·also Stanford HAI, MBZUAI

Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles

The HHH principle needs a serious makeover: this paper proposes a framework for dynamically prioritizing helpfulness, honesty, and harmlessness based on context, offering a more nuanced approach to AI alignment.

Yue Huang, Chujie Gao, Yujun Zhou +55

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Search

Yue Huang

Research focus

Frequent co-authors

Papers (2)