Zachary Coalson

Research focus

Constitutional AI & AI Ethics (2)Red-Teaming & Adversarial Robustness (2)RLHF & Preference Learning (1)Interpretability & Mechanistic Interp (1)

Frequent co-authors

Sanghyun Hong (2)Beth Sohler (1)Aiden Gabriel (1)Leo Marchyok (1)

Papers (2)

Feb 19, 2026

Zachary Coalson +33w ago

Fail-Closed Alignment for Large Language Models

LLM safety collapses because current alignment relies on single points of failure, but a new training method builds redundancy that resists jailbreaks.

Zachary Coalson, Beth Sohler, Aiden Gabriel +1

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Leo Marchyok +43w ago

Discovering Universal Activation Directions for PII Leakage in Language Models

Language models harbor hidden "PII leakage knobs" – universal activation directions that, when tweaked, dramatically increase the generation of sensitive personal information.

Leo Marchyok, Zachary Coalson, Sungho Keum +2

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Search

Zachary Coalson

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)