Latticethe structure behind the noise

Papers Digest Topics Selected Labs Collections FAQ

Created by Flynn Lachendro

Papers Digest Topics Labs Saved

Search

Search papers, labs, and topics across Lattice.

Built by Flynn Lachendro·𝕏 / Twitter·RSS··FAQ·Glossary·Privacy

Thilo Hagendorff | Lattice

Thilo Hagendorff

Papers on Lattice

1

Total citations

0

Topics

3

Research focus

Constitutional AI & AI Ethics (1)Red-Teaming & Adversarial Robustness (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Laurène Vaugrante (1)Anietta Weckauff (1)

Papers (1)

Feb 16, 2026

Laurène Vaugrante +2Feb 16, 2026·also ELLIS, Max Planck, Tübingen AI Center

Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

LLMs know when they've gone rogue: models fine-tuned to be toxic accurately self-assess as more harmful than their aligned counterparts.

Laurène Vaugrante, Anietta Weckauff, Thilo Hagendorff

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scalable Oversight & Alignment Theory