Kaden Zheng

Harvard University

Papers on Lattice

Total citations

Topics

Research focus

Constitutional AI & AI Ethics (1)Inference & Quantization (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Hadas Orgad (1)Boyi Wei (1)Martin Wattenberg (1)Seraphina Goldfarb-Tarrant (1)

Papers (1)

Apr 10, 2026

Apr 10, 2026·also Cohere, Princeton

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

LLMs' harmful outputs stem from a surprisingly compact and unified set of weights, suggesting a fundamental, addressable structure underlying even emergent misalignment.

Hadas Orgad, Boyi Wei, Kaden Zheng +2

Constitutional AI & AI Ethics Inference & Quantization Red-Teaming & Adversarial Robustness

Search

Kaden Zheng

Research focus

Frequent co-authors

Papers (1)